SPEECH AND LANGUAGE TECHNOLOGIES FOR SEMANTICALLY LINKED INSTRUCTIONAL CONTENT

Persistent Link:
http://hdl.handle.net/10150/201498
Title:
SPEECH AND LANGUAGE TECHNOLOGIES FOR SEMANTICALLY LINKED INSTRUCTIONAL CONTENT
Author:
Swaminathan, Ranjini
Issue Date:
2011
Publisher:
The University of Arizona.
Rights:
Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
Abstract:
Recent advances in technology have made it possible to offer educational content online in the form of e-learning systems. The Semantically Linked Instructional Content (SLIC) system, developed at The University of Arizona,is one such system that hosts educational and technical videos online.This dissertation proposes the integration of speech and language technologies with the SLIC system.Speech transcripts are being used increasingly in video browsing systems to help understand the video content better and to do search on the content with text queries. Transcripts are especially useful for people with disabilities and those who have a limited understanding of the language of the video. Automatic Speech Recognizers (ASRs) are commonly used to generate speech transcripts for videos but are not consistent in their performance. This issue is more pronounced in a system like SLIC due to the technical nature of talks with words not seen in the ASR vocabulary and many speakers with different voices and accents making recognition harder.The videos in SLIC come with presentation slides that contain words specific to the talk subject and the speech transcript itself can be considered to be composed of these slide words interspersed with other words. Furthermore, the errors in the transcript are words that sound similar to what was actually spoken; notes instead of nodes for example. The errors that occur due to misrecognized slide words can be fixed if we know which slide words were actually spoken and where they occur in the transcript. In other words, the slide words are matched or aligned with the transcript.In this dissertation two algorithms are developed to phonetically align transcript words with slide words based on a Hidden Markov Model and a Hybrid hidden semi-Markov model respectively. The slide words constitute the hidden states and the transcript words are the observed states in both models. The alignment algorithms are adapted for different applications such as transcript correction (as already mentioned), search and indexing, video segmentation and closed captioning. Results from experiments conducted show that the corrected transcripts have improved accuracy andyield better search results for slide word queries.
Type:
text; Electronic Dissertation
Keywords:
machine learning; multimedia; Computer Science; education technology; language processing
Degree Name:
Ph.D.
Degree Level:
doctoral
Degree Program:
Graduate College; Computer Science
Degree Grantor:
University of Arizona
Advisor:
Barnard, Kobus

Full metadata record

DC FieldValue Language
dc.language.isoenen_US
dc.titleSPEECH AND LANGUAGE TECHNOLOGIES FOR SEMANTICALLY LINKED INSTRUCTIONAL CONTENTen_US
dc.creatorSwaminathan, Ranjinien_US
dc.contributor.authorSwaminathan, Ranjinien_US
dc.date.issued2011-
dc.publisherThe University of Arizona.en_US
dc.rightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.en_US
dc.description.abstractRecent advances in technology have made it possible to offer educational content online in the form of e-learning systems. The Semantically Linked Instructional Content (SLIC) system, developed at The University of Arizona,is one such system that hosts educational and technical videos online.This dissertation proposes the integration of speech and language technologies with the SLIC system.Speech transcripts are being used increasingly in video browsing systems to help understand the video content better and to do search on the content with text queries. Transcripts are especially useful for people with disabilities and those who have a limited understanding of the language of the video. Automatic Speech Recognizers (ASRs) are commonly used to generate speech transcripts for videos but are not consistent in their performance. This issue is more pronounced in a system like SLIC due to the technical nature of talks with words not seen in the ASR vocabulary and many speakers with different voices and accents making recognition harder.The videos in SLIC come with presentation slides that contain words specific to the talk subject and the speech transcript itself can be considered to be composed of these slide words interspersed with other words. Furthermore, the errors in the transcript are words that sound similar to what was actually spoken; notes instead of nodes for example. The errors that occur due to misrecognized slide words can be fixed if we know which slide words were actually spoken and where they occur in the transcript. In other words, the slide words are matched or aligned with the transcript.In this dissertation two algorithms are developed to phonetically align transcript words with slide words based on a Hidden Markov Model and a Hybrid hidden semi-Markov model respectively. The slide words constitute the hidden states and the transcript words are the observed states in both models. The alignment algorithms are adapted for different applications such as transcript correction (as already mentioned), search and indexing, video segmentation and closed captioning. Results from experiments conducted show that the corrected transcripts have improved accuracy andyield better search results for slide word queries.en_US
dc.typetexten_US
dc.typeElectronic Dissertationen_US
dc.subjectmachine learningen_US
dc.subjectmultimediaen_US
dc.subjectComputer Scienceen_US
dc.subjecteducation technologyen_US
dc.subjectlanguage processingen_US
thesis.degree.namePh.D.en_US
thesis.degree.leveldoctoralen_US
thesis.degree.disciplineGraduate Collegeen_US
thesis.degree.disciplineComputer Scienceen_US
thesis.degree.grantorUniversity of Arizonaen_US
dc.contributor.advisorBarnard, Kobusen_US
dc.contributor.committeememberEfrat, Alonen_US
dc.contributor.committeememberFong, Sandiwayen_US
dc.contributor.committeememberAmir, Arnonen_US
dc.contributor.committeememberBarnard, Kobusen_US
This item is licensed under a Creative Commons License
Creative Commons
All Items in UA Campus Repository are protected by copyright, with all rights reserved, unless otherwise indicated.