Persistent Link:
http://hdl.handle.net/10150/297040
Title:
Top-Down Bayesian Modeling and Inference for Indoor Scenes
Author:
Del Pero, Luca
Issue Date:
2013
Publisher:
The University of Arizona.
Rights:
Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
Abstract:
People can understand the content of an image without effort. We can easily identify the objects in it, and figure out where they are in the 3D world. Automating these abilities is critical for many applications, like robotics, autonomous driving and surveillance. Unfortunately, despite recent advancements, fully automated vision systems for image understanding do not exist. In this work, we present progress restricted to the domain of images of indoor scenes, such as bedrooms and kitchens. These environments typically have the "Manhattan" property that most surfaces are parallel to three principal ones. Further, the 3D geometry of a room and the objects within it can be approximated with simple geometric primitives, such as 3D blocks. Our goal is to reconstruct the 3D geometry of an indoor environment while also understanding its semantic meaning, by identifying the objects in the scene, such as beds and couches. We separately model the 3D geometry, the camera, and an image likelihood, to provide a generative statistical model for image data. Our representation captures the rich structure of an indoor scene, by explicitly modeling the contextual relationships among its elements, such as the typical size of objects and their arrangement in the room, and simple physical constraints, such as 3D objects do not intersect. This ensures that the predicted image interpretation will be globally coherent geometrically and semantically, which allows tackling the ambiguities caused by projecting a 3D scene onto an image, such as occlusions and foreshortening. We fit this model to images using MCMC sampling. Our inference method combines bottom-up evidence from the data and top-down knowledge from the 3D world, in order to explore the vast output space efficiently. Comprehensive evaluation confirms our intuition that global inference of the entire scene is more effective than estimating its individual elements independently. Further, our experiments show that our approach is competitive and often exceeds the results of state-of-the-art methods.
Type:
text; Electronic Dissertation
Keywords:
Bayesian inference; Computer Vision; Indoor scenes; Object recognition; Scene understanding; Computer Science; 3D reconstruction
Degree Name:
Ph.D.
Degree Level:
doctoral
Degree Program:
Graduate College; Computer Science
Degree Grantor:
University of Arizona
Advisor:
Barnard, Kobus

Full metadata record

DC FieldValue Language
dc.language.isoenen_US
dc.titleTop-Down Bayesian Modeling and Inference for Indoor Scenesen_US
dc.creatorDel Pero, Lucaen_US
dc.contributor.authorDel Pero, Lucaen_US
dc.date.issued2013-
dc.publisherThe University of Arizona.en_US
dc.rightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.en_US
dc.description.abstractPeople can understand the content of an image without effort. We can easily identify the objects in it, and figure out where they are in the 3D world. Automating these abilities is critical for many applications, like robotics, autonomous driving and surveillance. Unfortunately, despite recent advancements, fully automated vision systems for image understanding do not exist. In this work, we present progress restricted to the domain of images of indoor scenes, such as bedrooms and kitchens. These environments typically have the "Manhattan" property that most surfaces are parallel to three principal ones. Further, the 3D geometry of a room and the objects within it can be approximated with simple geometric primitives, such as 3D blocks. Our goal is to reconstruct the 3D geometry of an indoor environment while also understanding its semantic meaning, by identifying the objects in the scene, such as beds and couches. We separately model the 3D geometry, the camera, and an image likelihood, to provide a generative statistical model for image data. Our representation captures the rich structure of an indoor scene, by explicitly modeling the contextual relationships among its elements, such as the typical size of objects and their arrangement in the room, and simple physical constraints, such as 3D objects do not intersect. This ensures that the predicted image interpretation will be globally coherent geometrically and semantically, which allows tackling the ambiguities caused by projecting a 3D scene onto an image, such as occlusions and foreshortening. We fit this model to images using MCMC sampling. Our inference method combines bottom-up evidence from the data and top-down knowledge from the 3D world, in order to explore the vast output space efficiently. Comprehensive evaluation confirms our intuition that global inference of the entire scene is more effective than estimating its individual elements independently. Further, our experiments show that our approach is competitive and often exceeds the results of state-of-the-art methods.en_US
dc.typetexten_US
dc.typeElectronic Dissertationen_US
dc.subjectBayesian inferenceen_US
dc.subjectComputer Visionen_US
dc.subjectIndoor scenesen_US
dc.subjectObject recognitionen_US
dc.subjectScene understandingen_US
dc.subjectComputer Scienceen_US
dc.subject3D reconstructionen_US
thesis.degree.namePh.D.en_US
thesis.degree.leveldoctoralen_US
thesis.degree.disciplineGraduate Collegeen_US
thesis.degree.disciplineComputer Scienceen_US
thesis.degree.grantorUniversity of Arizonaen_US
dc.contributor.advisorBarnard, Kobusen_US
dc.contributor.committeememberBarnard, Kobusen_US
dc.contributor.committeememberCohen, Paulen_US
dc.contributor.committeememberEfrat, Alonen_US
dc.contributor.committeememberMorrison, Claytonen_US
This item is licensed under a Creative Commons License
Creative Commons
All Items in UA Campus Repository are protected by copyright, with all rights reserved, unless otherwise indicated.