Evaluation and Optimization of Turnaround Time and Cost of HPC Applications on the Cloud

Persistent Link:
http://hdl.handle.net/10150/332825
Title:
Evaluation and Optimization of Turnaround Time and Cost of HPC Applications on the Cloud
Author:
Marathe, Aniruddha Prakash
Issue Date:
2014
Publisher:
The University of Arizona.
Rights:
Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
Abstract:
The popularity of Amazon's EC2 cloud platform has increased in commercial and scientific high-performance computing (HPC) applications domain in recent years. However, many HPC users consider dedicated high-performance clusters, typically found in large compute centers such as those in national laboratories, to be far superior to EC2 because of significant communication overhead of the latter. We find this view to be quite narrow and the proper metrics for comparing high-performance clusters to EC2 is turnaround time and cost. In this work, we first compare the HPC-grade EC2 cluster to top-of-the-line HPC clusters based on turnaround time and total cost of execution. When measuring turnaround time, we include expected queue wait time on HPC clusters. Our results show that although as expected, standard HPC clusters are superior in raw performance, they suffer from potentially significant queue wait times. We show that EC2 clusters may produce better turnaround times due to typically lower wait queue times. To estimate cost, we developed a pricing model---relative to EC2's node-hour prices---to set node-hour prices for (currently free) HPC clusters. We observe that the cost-effectiveness of running an application on a cluster depends on raw performance and application scalability. However, despite the potentially lower queue wait and turnaround times, the primary barrier to using clouds for many HPC users is the cost. Amazon EC2 provides a fixed-cost option (called on-demand) and a variable-cost, auction-based option (called the spot market). The spot market trades lower cost for potential interruptions that necessitate checkpointing; if the market price exceeds the bid price, a node is taken away from the user without warning. We explore techniques to maximize performance per dollar given a time constraint within which an application must complete. Specifically, we design and implement multiple techniques to reduce expected cost by exploiting redundancy in the EC2 spot market. We then design an adaptive algorithm that selects a scheduling algorithm and determines the bid price. We show that our adaptive algorithm executes programs up to 7x cheaper than using the on-demand market and up to 44% cheaper than the best non-redundant, spot-market algorithm. Finally, we extend our adaptive algorithm to exploit several opportunities for cost-savings on the EC2 spot market. First, we incorporate application scalability characteristics into our adaptive policy. We show that the adaptive algorithm informed with scalability characteristics of applications achieves up to 56% cost-savings compared to the expected cost for the base adaptive algorithm run at a fixed, user-defined scale. Second, we demonstrate potential for obtaining considerable free computation time on the spot market enabled by its hour-boundary pricing model.
Type:
text; Electronic Dissertation
Keywords:
Cloud Computing; Cost-performance tradeoff; High Performance Computing; Scheduling; Spot Market; Computer Science; Amazon EC2
Degree Name:
Ph.D.
Degree Level:
doctoral
Degree Program:
Graduate College; Computer Science
Degree Grantor:
University of Arizona
Advisor:
Lowenthal, David K.

Full metadata record

DC FieldValue Language
dc.language.isoen_USen
dc.titleEvaluation and Optimization of Turnaround Time and Cost of HPC Applications on the Clouden_US
dc.creatorMarathe, Aniruddha Prakashen_US
dc.contributor.authorMarathe, Aniruddha Prakashen_US
dc.date.issued2014-
dc.publisherThe University of Arizona.en_US
dc.rightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.en_US
dc.description.abstractThe popularity of Amazon's EC2 cloud platform has increased in commercial and scientific high-performance computing (HPC) applications domain in recent years. However, many HPC users consider dedicated high-performance clusters, typically found in large compute centers such as those in national laboratories, to be far superior to EC2 because of significant communication overhead of the latter. We find this view to be quite narrow and the proper metrics for comparing high-performance clusters to EC2 is turnaround time and cost. In this work, we first compare the HPC-grade EC2 cluster to top-of-the-line HPC clusters based on turnaround time and total cost of execution. When measuring turnaround time, we include expected queue wait time on HPC clusters. Our results show that although as expected, standard HPC clusters are superior in raw performance, they suffer from potentially significant queue wait times. We show that EC2 clusters may produce better turnaround times due to typically lower wait queue times. To estimate cost, we developed a pricing model---relative to EC2's node-hour prices---to set node-hour prices for (currently free) HPC clusters. We observe that the cost-effectiveness of running an application on a cluster depends on raw performance and application scalability. However, despite the potentially lower queue wait and turnaround times, the primary barrier to using clouds for many HPC users is the cost. Amazon EC2 provides a fixed-cost option (called on-demand) and a variable-cost, auction-based option (called the spot market). The spot market trades lower cost for potential interruptions that necessitate checkpointing; if the market price exceeds the bid price, a node is taken away from the user without warning. We explore techniques to maximize performance per dollar given a time constraint within which an application must complete. Specifically, we design and implement multiple techniques to reduce expected cost by exploiting redundancy in the EC2 spot market. We then design an adaptive algorithm that selects a scheduling algorithm and determines the bid price. We show that our adaptive algorithm executes programs up to 7x cheaper than using the on-demand market and up to 44% cheaper than the best non-redundant, spot-market algorithm. Finally, we extend our adaptive algorithm to exploit several opportunities for cost-savings on the EC2 spot market. First, we incorporate application scalability characteristics into our adaptive policy. We show that the adaptive algorithm informed with scalability characteristics of applications achieves up to 56% cost-savings compared to the expected cost for the base adaptive algorithm run at a fixed, user-defined scale. Second, we demonstrate potential for obtaining considerable free computation time on the spot market enabled by its hour-boundary pricing model.en_US
dc.typetexten
dc.typeElectronic Dissertationen
dc.subjectCloud Computingen_US
dc.subjectCost-performance tradeoffen_US
dc.subjectHigh Performance Computingen_US
dc.subjectSchedulingen_US
dc.subjectSpot Marketen_US
dc.subjectComputer Scienceen_US
dc.subjectAmazon EC2en_US
thesis.degree.namePh.D.en_US
thesis.degree.leveldoctoralen_US
thesis.degree.disciplineGraduate Collegeen_US
thesis.degree.disciplineComputer Scienceen_US
thesis.degree.grantorUniversity of Arizonaen_US
dc.contributor.advisorLowenthal, David K.en_US
dc.contributor.committeememberLowenthal, David K.en_US
dc.contributor.committeememberde Supinski, Bronis R.en_US
dc.contributor.committeememberHartman, Johnen_US
dc.contributor.committeememberGniady, Christopheren_US
All Items in UA Campus Repository are protected by copyright, with all rights reserved, unless otherwise indicated.