Identifying Latent Attributes from Video Scenes Using Knowledge Acquired From Large Collections of Text Documents

Persistent Link:
http://hdl.handle.net/10150/332735
Title:
Identifying Latent Attributes from Video Scenes Using Knowledge Acquired From Large Collections of Text Documents
Author:
Tran, Anh Xuan
Issue Date:
2014
Publisher:
The University of Arizona.
Rights:
Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
Abstract:
Peter Drucker, a well-known influential writer and philosopher in the field of management theory and practice, once claimed that “the most important thing in communication is hearing what isn't said.” It is not difficult to see that a similar concept also holds in the context of video scene understanding. In almost every non-trivial video scene, most important elements, such as the motives and intentions of the actors, can never be seen or directly observed, yet the identification of these latent attributes is crucial to our full understanding of the scene. That is to say, latent attributes matter. In this work, we explore the task of identifying latent attributes in video scenes, focusing on the mental states of participant actors. We propose a novel approach to the problem based on the use of large text collections as background knowledge and minimal information about the videos, such as activity and actor types, as query context. We formalize the task and a measure of merit that accounts for the semantic relatedness of mental state terms, as well as their distribution weights. We develop and test several largely unsupervised information extraction models that identify the mental state labels of human participants in video scenes given some contextual information about the scenes. We show that these models produce complementary information and their combination significantly outperforms the individual models, and improves performance over several baseline methods on two different datasets. We present an extensive analysis of our models and close with a discussion of our findings, along with a roadmap for future research.
Type:
text; Electronic Dissertation
Keywords:
computer vision; information extraction; information retrieval; mental state inference; natural language processing; Computer Science; artificial intelligence
Degree Name:
Ph.D.
Degree Level:
doctoral
Degree Program:
Graduate College; Computer Science
Degree Grantor:
University of Arizona
Advisor:
Cohen, Paul R.; Surdeanu, Mihai

Full metadata record

DC FieldValue Language
dc.language.isoen_USen
dc.titleIdentifying Latent Attributes from Video Scenes Using Knowledge Acquired From Large Collections of Text Documentsen_US
dc.creatorTran, Anh Xuanen_US
dc.contributor.authorTran, Anh Xuanen_US
dc.date.issued2014-
dc.publisherThe University of Arizona.en_US
dc.rightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.en_US
dc.description.abstractPeter Drucker, a well-known influential writer and philosopher in the field of management theory and practice, once claimed that “the most important thing in communication is hearing what isn't said.” It is not difficult to see that a similar concept also holds in the context of video scene understanding. In almost every non-trivial video scene, most important elements, such as the motives and intentions of the actors, can never be seen or directly observed, yet the identification of these latent attributes is crucial to our full understanding of the scene. That is to say, latent attributes matter. In this work, we explore the task of identifying latent attributes in video scenes, focusing on the mental states of participant actors. We propose a novel approach to the problem based on the use of large text collections as background knowledge and minimal information about the videos, such as activity and actor types, as query context. We formalize the task and a measure of merit that accounts for the semantic relatedness of mental state terms, as well as their distribution weights. We develop and test several largely unsupervised information extraction models that identify the mental state labels of human participants in video scenes given some contextual information about the scenes. We show that these models produce complementary information and their combination significantly outperforms the individual models, and improves performance over several baseline methods on two different datasets. We present an extensive analysis of our models and close with a discussion of our findings, along with a roadmap for future research.en_US
dc.typetexten
dc.typeElectronic Dissertationen
dc.subjectcomputer visionen_US
dc.subjectinformation extractionen_US
dc.subjectinformation retrievalen_US
dc.subjectmental state inferenceen_US
dc.subjectnatural language processingen_US
dc.subjectComputer Scienceen_US
dc.subjectartificial intelligenceen_US
thesis.degree.namePh.D.en_US
thesis.degree.leveldoctoralen_US
thesis.degree.disciplineGraduate Collegeen_US
thesis.degree.disciplineComputer Scienceen_US
thesis.degree.grantorUniversity of Arizonaen_US
dc.contributor.advisorCohen, Paul R.en_US
dc.contributor.advisorSurdeanu, Mihaien_US
dc.contributor.committeememberCohen, Paul R.en_US
dc.contributor.committeememberSurdeanu, Mihaien_US
dc.contributor.committeememberBarnard, Kobusen_US
dc.contributor.committeememberMcAllister, Ken S.en_US
All Items in UA Campus Repository are protected by copyright, with all rights reserved, unless otherwise indicated.