Toward Enhancing Automated Credibility Assessment: A Model for Question Type Classification and Tools for Linguistic Analysis

Persistent Link:
http://hdl.handle.net/10150/145456
Title:
Toward Enhancing Automated Credibility Assessment: A Model for Question Type Classification and Tools for Linguistic Analysis
Author:
Moffitt, Kevin Christopher
Issue Date:
2011
Publisher:
The University of Arizona.
Rights:
Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
Abstract:
The three objectives of this dissertation were to develop a question type model for predicting linguistic features of responses to interview questions, create a tool for linguistic analysis of documents, and use lexical bundle analysis to identify linguistic differences between fraudulent and non-fraudulent financial reports. First, The Moffitt Question Type Model (MQTM) was developed to aid in predicting linguistic features of responses to questions. It focuses on three context independent features of questions: tense (past vs. present vs. future), perspective (introspective vs. extrospective), and abstractness (concrete vs. conjectural). The MQTM was tested on responses to real-world pre-polygraph examination questions in which guilty (n = 27) and innocent (n = 20) interviewees were interviewed. The responses were grouped according to question type and the linguistic cues from each groups' transcripts were compared using independent samples t-tests with the following results: future tense questions elicited more future tense words than either past or present tense questions and present tense questions elicited more present tense words than past tense questions; introspective questions elicited more cognitive process words and affective words than extrospective questions; and conjectural questions elicited more auxiliary verbs, tentativeness words, and cognitive process words than concrete questions. Second, a tool for linguistic analysis of text documents, Structured Programming for Linguistic Cue Extraction (SPLICE), was developed to help researchers and software developers compute linguistic values for dictionary-based cues and cues that require natural language processing techniques. SPLICE implements a GUI interface for researchers and an API for developers. Finally, an analysis of 560 lexical bundles detected linguistic differences between 101 fraudulent and 101 non-fraudulent 10-K filings. Phrases such as "the fair value of," and "goodwill and other intangible assets" were used at a much higher rate in fraudulent 10-Ks. A principal component analysis reduced the number of variables to 88 orthogonal components which were used in a discriminant analysis that classified the documents with 71% accuracy. Findings in this dissertation suggest the MQTM could be used to predict features of interviewee responses in most contexts and that lexical bundle analysis is a viable tool for discriminating between fraudulent and non-fraudulent text.
Type:
Electronic Dissertation; text
Keywords:
Automated Linguistic Analysis; Credibility Assessment; Fraudulent Financial Reporting; Question Type
Degree Name:
Ph.D.
Degree Level:
doctoral
Degree Program:
Graduate College; Management Information Systems
Degree Grantor:
University of Arizona
Advisor:
Burgoon, Judee K.; Nunamaker, Jay F.

Full metadata record

DC FieldValue Language
dc.language.isoenen_US
dc.titleToward Enhancing Automated Credibility Assessment: A Model for Question Type Classification and Tools for Linguistic Analysisen_US
dc.creatorMoffitt, Kevin Christopheren_US
dc.contributor.authorMoffitt, Kevin Christopheren_US
dc.date.issued2011-
dc.publisherThe University of Arizona.en_US
dc.rightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.en_US
dc.description.abstractThe three objectives of this dissertation were to develop a question type model for predicting linguistic features of responses to interview questions, create a tool for linguistic analysis of documents, and use lexical bundle analysis to identify linguistic differences between fraudulent and non-fraudulent financial reports. First, The Moffitt Question Type Model (MQTM) was developed to aid in predicting linguistic features of responses to questions. It focuses on three context independent features of questions: tense (past vs. present vs. future), perspective (introspective vs. extrospective), and abstractness (concrete vs. conjectural). The MQTM was tested on responses to real-world pre-polygraph examination questions in which guilty (n = 27) and innocent (n = 20) interviewees were interviewed. The responses were grouped according to question type and the linguistic cues from each groups' transcripts were compared using independent samples t-tests with the following results: future tense questions elicited more future tense words than either past or present tense questions and present tense questions elicited more present tense words than past tense questions; introspective questions elicited more cognitive process words and affective words than extrospective questions; and conjectural questions elicited more auxiliary verbs, tentativeness words, and cognitive process words than concrete questions. Second, a tool for linguistic analysis of text documents, Structured Programming for Linguistic Cue Extraction (SPLICE), was developed to help researchers and software developers compute linguistic values for dictionary-based cues and cues that require natural language processing techniques. SPLICE implements a GUI interface for researchers and an API for developers. Finally, an analysis of 560 lexical bundles detected linguistic differences between 101 fraudulent and 101 non-fraudulent 10-K filings. Phrases such as "the fair value of," and "goodwill and other intangible assets" were used at a much higher rate in fraudulent 10-Ks. A principal component analysis reduced the number of variables to 88 orthogonal components which were used in a discriminant analysis that classified the documents with 71% accuracy. Findings in this dissertation suggest the MQTM could be used to predict features of interviewee responses in most contexts and that lexical bundle analysis is a viable tool for discriminating between fraudulent and non-fraudulent text.en_US
dc.typeElectronic Dissertationen_US
dc.typetexten_US
dc.subjectAutomated Linguistic Analysisen_US
dc.subjectCredibility Assessmenten_US
dc.subjectFraudulent Financial Reportingen_US
dc.subjectQuestion Typeen_US
thesis.degree.namePh.D.en_US
thesis.degree.leveldoctoralen_US
thesis.degree.disciplineGraduate Collegeen_US
thesis.degree.disciplineManagement Information Systemsen_US
thesis.degree.grantorUniversity of Arizonaen_US
dc.contributor.advisorBurgoon, Judee K.en_US
dc.contributor.advisorNunamaker, Jay F.en_US
dc.contributor.committeememberZhang, Zhuen_US
dc.identifier.proquest11559-
dc.identifier.oclc752261422-
All Items in UA Campus Repository are protected by copyright, with all rights reserved, unless otherwise indicated.