Mathematical programming in data mining: Models for binary classification with application to collusion detection in online gambling

Persistent Link:
http://hdl.handle.net/10150/280270
Title:
Mathematical programming in data mining: Models for binary classification with application to collusion detection in online gambling
Author:
Domm, Maryanne
Issue Date:
2003
Publisher:
The University of Arizona.
Rights:
Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
Abstract:
Data mining is a semi-automated technique to discover patterns and trends in large amounts of data and can be used to build statistical models to predict those patterns and trends. One type of prediction model is a classifier, which attempts to predict to which group a particular item belongs. An important binary classifier, the Support Vector Machine classifier, uses non-linear optimization to find a hyperplane separating the two classes of data. This classifier has been reformulated as a linear program and as a pure quadratic program. We propose two modeling extensions to the Support Vector Machine classifier. The first, the Linearized Proximal Support Vector Machine classifier, linearizes the objective function of the pure quadratic version. This reduces the importance the classifier places on outlying data points. The second extension improves the conceptual accuracy of the model. The Integer Support Vector Machine classifier uses binary indicator variables to indicate potential misclassification errors and minimizes these errors directly. Performance of both these new classifiers was evaluated on a simple two dimensional data set as well as on several data sets commonly used in the literature and was compared to the original classifiers. These classifiers were then used to build a model to detect collusion in online gambling. Collusion occurs when two or more players play differently against each other than against the rest of the players. Since their communication cannot be intercepted, collusion is easier for online gamblers. However, collusion can still be identified by examining the playing style of the colluding players. By analyzing the record of play from online poker, a model to predict whether a hand contains colluding players or not can be built. We found that these new classifiers performed about as well as previous classifiers and sometimes worse and sometimes better. We also found that one form of online collusion could be detected, but not perfectly.
Type:
text; Dissertation-Reproduction (electronic)
Keywords:
Operations Research.
Degree Name:
Ph.D.
Degree Level:
doctoral
Degree Program:
Graduate College; Systems and Industrial Engineering
Degree Grantor:
University of Arizona
Advisor:
Goldberg, Jeff

Full metadata record

DC FieldValue Language
dc.language.isoen_USen_US
dc.titleMathematical programming in data mining: Models for binary classification with application to collusion detection in online gamblingen_US
dc.creatorDomm, Maryanneen_US
dc.contributor.authorDomm, Maryanneen_US
dc.date.issued2003en_US
dc.publisherThe University of Arizona.en_US
dc.rightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.en_US
dc.description.abstractData mining is a semi-automated technique to discover patterns and trends in large amounts of data and can be used to build statistical models to predict those patterns and trends. One type of prediction model is a classifier, which attempts to predict to which group a particular item belongs. An important binary classifier, the Support Vector Machine classifier, uses non-linear optimization to find a hyperplane separating the two classes of data. This classifier has been reformulated as a linear program and as a pure quadratic program. We propose two modeling extensions to the Support Vector Machine classifier. The first, the Linearized Proximal Support Vector Machine classifier, linearizes the objective function of the pure quadratic version. This reduces the importance the classifier places on outlying data points. The second extension improves the conceptual accuracy of the model. The Integer Support Vector Machine classifier uses binary indicator variables to indicate potential misclassification errors and minimizes these errors directly. Performance of both these new classifiers was evaluated on a simple two dimensional data set as well as on several data sets commonly used in the literature and was compared to the original classifiers. These classifiers were then used to build a model to detect collusion in online gambling. Collusion occurs when two or more players play differently against each other than against the rest of the players. Since their communication cannot be intercepted, collusion is easier for online gamblers. However, collusion can still be identified by examining the playing style of the colluding players. By analyzing the record of play from online poker, a model to predict whether a hand contains colluding players or not can be built. We found that these new classifiers performed about as well as previous classifiers and sometimes worse and sometimes better. We also found that one form of online collusion could be detected, but not perfectly.en_US
dc.typetexten_US
dc.typeDissertation-Reproduction (electronic)en_US
dc.subjectOperations Research.en_US
thesis.degree.namePh.D.en_US
thesis.degree.leveldoctoralen_US
thesis.degree.disciplineGraduate Collegeen_US
thesis.degree.disciplineSystems and Industrial Engineeringen_US
thesis.degree.grantorUniversity of Arizonaen_US
dc.contributor.advisorGoldberg, Jeffen_US
dc.identifier.proquest3089927en_US
dc.identifier.bibrecord.b44419053en_US
All Items in UA Campus Repository are protected by copyright, with all rights reserved, unless otherwise indicated.