A system of deception and fraud detection using reliable linguistic cues including hedging, disfluencies, and repeated phrases

Persistent Link:
http://hdl.handle.net/10150/196115
Title:
A system of deception and fraud detection using reliable linguistic cues including hedging, disfluencies, and repeated phrases
Author:
Humpherys, Sean L.
Issue Date:
2010
Publisher:
The University of Arizona.
Rights:
Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
Abstract:
Given the increasing problem of fraud, crime, and national security threats, assessing credibility is a recurring research topic in Information Systems and in other disciplines. Decision support systems can help. But the success of the system depends on reliable cues that can distinguish deceptive/truthful behavior and on a proven classification algorithm. This investigation aims to identify linguistic cues that distinguish deceivers from truthtellers; and it aims to demonstrate how the cues can successfully classify deception and truth.Three new datasets were gathered: 202 fraudulent and nonfraudulent financial disclosures (10-Ks), a laboratory experiment that asked twelve questions of participants who answered deceptively to some questions and truthfully to others (Cultural Interviews), and a mock crime experiment where some participants stole a ring from an office and where all participants were interviewed as to their guilt or innocence (Mock Crime). Transcribed participant responses were investigated for distinguishing cues and used for classification testing.Disfluencies (e.g., um, uh, repeated phrases, etc.), hedging words (e.g., perhaps, may, etc.), and interjections (e.g., okay, like, etc.) are theoretically developed as potential cues to deception. Past research provides conflicting evidence regarding disfluency use and deception. Some researchers opine that deception increases cognitive load, which lowers attentional resources, which increases speech errors, and thereby increases disfluency use (i.e., Cognitive-Load Disfluency theory). Other researchers argue against the causal link between disfluencies and speech errors, positing that disfluencies are controllable and that deceivers strategically avoid disfluencies to avoid appearing hesitant or untruthful (i.e., Suppression-Disfluency theory). A series of t-tests, repeated measures GLMs, and nested-model design regressions disconfirm the Suppression-Disfluency theory. Um, uh, and interjections are used at an increased rate by deceivers in spontaneous speech. Reverse order questioning did not increase disfluency use. Fraudulent 10-Ks have a higher mean count of hedging words.Statistical classifiers and machine learning algorithms are demonstrated on the three datasets. A feature reduction by backward Wald stepwise with logistic regression had the highest classification accuracies (69%-87%). Accuracies are compared to professional interviewers and to previously researched classification models. In many cases the new models demonstrated improvements. 10-Ks are classified with 69% overall accuracy.
Type:
text; Electronic Dissertation
Keywords:
classification algorithm; credibility assessment; deception detection; decision support systems; disfluency; interjections
Degree Name:
Ph.D.
Degree Level:
doctoral
Degree Program:
Management Information Systems; Graduate College
Degree Grantor:
University of Arizona
Advisor:
Nunamaker, Jay F.; Burgoon, Judee K.
Committee Chair:
Nunamaker, Jay F.

Full metadata record

DC FieldValue Language
dc.language.isoenen_US
dc.titleA system of deception and fraud detection using reliable linguistic cues including hedging, disfluencies, and repeated phrasesen_US
dc.creatorHumpherys, Sean L.en_US
dc.contributor.authorHumpherys, Sean L.en_US
dc.date.issued2010en_US
dc.publisherThe University of Arizona.en_US
dc.rightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.en_US
dc.description.abstractGiven the increasing problem of fraud, crime, and national security threats, assessing credibility is a recurring research topic in Information Systems and in other disciplines. Decision support systems can help. But the success of the system depends on reliable cues that can distinguish deceptive/truthful behavior and on a proven classification algorithm. This investigation aims to identify linguistic cues that distinguish deceivers from truthtellers; and it aims to demonstrate how the cues can successfully classify deception and truth.Three new datasets were gathered: 202 fraudulent and nonfraudulent financial disclosures (10-Ks), a laboratory experiment that asked twelve questions of participants who answered deceptively to some questions and truthfully to others (Cultural Interviews), and a mock crime experiment where some participants stole a ring from an office and where all participants were interviewed as to their guilt or innocence (Mock Crime). Transcribed participant responses were investigated for distinguishing cues and used for classification testing.Disfluencies (e.g., um, uh, repeated phrases, etc.), hedging words (e.g., perhaps, may, etc.), and interjections (e.g., okay, like, etc.) are theoretically developed as potential cues to deception. Past research provides conflicting evidence regarding disfluency use and deception. Some researchers opine that deception increases cognitive load, which lowers attentional resources, which increases speech errors, and thereby increases disfluency use (i.e., Cognitive-Load Disfluency theory). Other researchers argue against the causal link between disfluencies and speech errors, positing that disfluencies are controllable and that deceivers strategically avoid disfluencies to avoid appearing hesitant or untruthful (i.e., Suppression-Disfluency theory). A series of t-tests, repeated measures GLMs, and nested-model design regressions disconfirm the Suppression-Disfluency theory. Um, uh, and interjections are used at an increased rate by deceivers in spontaneous speech. Reverse order questioning did not increase disfluency use. Fraudulent 10-Ks have a higher mean count of hedging words.Statistical classifiers and machine learning algorithms are demonstrated on the three datasets. A feature reduction by backward Wald stepwise with logistic regression had the highest classification accuracies (69%-87%). Accuracies are compared to professional interviewers and to previously researched classification models. In many cases the new models demonstrated improvements. 10-Ks are classified with 69% overall accuracy.en_US
dc.typetexten_US
dc.typeElectronic Dissertationen_US
dc.subjectclassification algorithmen_US
dc.subjectcredibility assessmenten_US
dc.subjectdeception detectionen_US
dc.subjectdecision support systemsen_US
dc.subjectdisfluencyen_US
dc.subjectinterjectionsen_US
thesis.degree.namePh.D.en_US
thesis.degree.leveldoctoralen_US
thesis.degree.disciplineManagement Information Systemsen_US
thesis.degree.disciplineGraduate Collegeen_US
thesis.degree.grantorUniversity of Arizonaen_US
dc.contributor.advisorNunamaker, Jay F.en_US
dc.contributor.advisorBurgoon, Judee K.en_US
dc.contributor.chairNunamaker, Jay F.en_US
dc.contributor.committeememberBurgoon, Judee K.en_US
dc.contributor.committeememberGoes, Paulo B.en_US
dc.contributor.committeememberChan, Erwinen_US
dc.identifier.proquest11341en_US
dc.identifier.oclc752261200en_US
All Items in UA Campus Repository are protected by copyright, with all rights reserved, unless otherwise indicated.