The goal of image understanding has long been a shared goal in the field of computer vision. Extracting the required scene-level information from image data is a formidable task, however. While the raw data comes in the form of matrices of numbers, the inferences that we must perform occur at a much higher level of abstraction. Much progress has been made in recent years in extracting the primitives of an image in isolation, for example detecting the objects, labeling the regions, or extracting the surfaces. Modeling the interactions and fine-grained distinctions between these primitives is an important next step along the path to scene understanding. Because this involves reasoning about relationships between heterogeneous entities at a high level of abstraction, this problem lends itself well to the tools of probabilistic graphical models.