Persistent Link:
http://hdl.handle.net/10150/193660
Title:
Anomaly-based Self-Healing Framework in Distributed Systems
Author:
Kim, Byoung Uk
Issue Date:
2008
Publisher:
The University of Arizona.
Rights:
Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
Abstract:
One of the important design criteria for distributed systems and their applications is their reliability and robustness to hardware and software failures. The increase in complexity, interconnectedness, dependency and the asynchronous interactions between the components that include hardware resources (computers, servers, network devices), and software (application services, middleware, web services, etc.) makes the fault detection and tolerance a challenging research problem. In this dissertation, we present a self healing methodology based on the principles of autonomic computing, statistical and data mining techniques to detect faults (hardware or software) and also identify the source of the fault. In our approach, we monitor and analyze in real-time all the interactions between all the components of a distributed system using two software modules: Component Fault Manager (CFM) to monitor all set of measurement attributes for applications and nodes and Application Fault Manager (AFM) that is responsible for several activities such as monitoring, anomaly analysis, root cause analysis and recovery. We used three-dimensional array of features to capture spatial and temporal features to be used by an anomaly analysis engine to immediately generate an alert when abnormal behavior pattern is detected due to a software or hardware failure. We use several fault tolerance metrics (false positive, false negative, precision, recall, missed alarm rate, detection accuracy, latency and overhead) to evaluate the effectiveness of our self healing approach when compared to other techniques. We applied our approach to an industry standard web e-commerce application to emulate a complex e-commerce environment. We evaluate the effectiveness of our approach and its performance to detect software faults that we inject asynchronously, and compare the results for different noise levels. Our experimental results showed that by applying our anomaly based approach, false positive, false negative, missed alarm rate and detection accuracy can be improved significantly. For example, evaluating the effectiveness of this approach to detect faults injected asynchronously shows a detection rate of above 99.9% with no false alarms for a wide range of faulty and normal operational scenarios.
Type:
text; Electronic Dissertation
Keywords:
Anomaly analysis; distributed system; Fault; Reliability; Self-healing
Degree Name:
Ph.D.
Degree Level:
doctoral
Degree Program:
Electrical & Computer Engineering; Graduate College
Degree Grantor:
University of Arizona
Advisor:
Hariri, Salim
Committee Chair:
Hariri, Salim

Full metadata record

DC FieldValue Language
dc.language.isoenen_US
dc.titleAnomaly-based Self-Healing Framework in Distributed Systemsen_US
dc.creatorKim, Byoung Uken_US
dc.contributor.authorKim, Byoung Uken_US
dc.date.issued2008en_US
dc.publisherThe University of Arizona.en_US
dc.rightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.en_US
dc.description.abstractOne of the important design criteria for distributed systems and their applications is their reliability and robustness to hardware and software failures. The increase in complexity, interconnectedness, dependency and the asynchronous interactions between the components that include hardware resources (computers, servers, network devices), and software (application services, middleware, web services, etc.) makes the fault detection and tolerance a challenging research problem. In this dissertation, we present a self healing methodology based on the principles of autonomic computing, statistical and data mining techniques to detect faults (hardware or software) and also identify the source of the fault. In our approach, we monitor and analyze in real-time all the interactions between all the components of a distributed system using two software modules: Component Fault Manager (CFM) to monitor all set of measurement attributes for applications and nodes and Application Fault Manager (AFM) that is responsible for several activities such as monitoring, anomaly analysis, root cause analysis and recovery. We used three-dimensional array of features to capture spatial and temporal features to be used by an anomaly analysis engine to immediately generate an alert when abnormal behavior pattern is detected due to a software or hardware failure. We use several fault tolerance metrics (false positive, false negative, precision, recall, missed alarm rate, detection accuracy, latency and overhead) to evaluate the effectiveness of our self healing approach when compared to other techniques. We applied our approach to an industry standard web e-commerce application to emulate a complex e-commerce environment. We evaluate the effectiveness of our approach and its performance to detect software faults that we inject asynchronously, and compare the results for different noise levels. Our experimental results showed that by applying our anomaly based approach, false positive, false negative, missed alarm rate and detection accuracy can be improved significantly. For example, evaluating the effectiveness of this approach to detect faults injected asynchronously shows a detection rate of above 99.9% with no false alarms for a wide range of faulty and normal operational scenarios.en_US
dc.typetexten_US
dc.typeElectronic Dissertationen_US
dc.subjectAnomaly analysisen_US
dc.subjectdistributed systemen_US
dc.subjectFaulten_US
dc.subjectReliabilityen_US
dc.subjectSelf-healingen_US
thesis.degree.namePh.D.en_US
thesis.degree.leveldoctoralen_US
thesis.degree.disciplineElectrical & Computer Engineeringen_US
thesis.degree.disciplineGraduate Collegeen_US
thesis.degree.grantorUniversity of Arizonaen_US
dc.contributor.advisorHariri, Salimen_US
dc.contributor.chairHariri, Salimen_US
dc.contributor.committeememberRozenblit, Jerzy W.en_US
dc.contributor.committeememberAkoglu, Alien_US
dc.identifier.proquest10119en_US
dc.identifier.oclc659750660en_US
All Items in UA Campus Repository are protected by copyright, with all rights reserved, unless otherwise indicated.