StarCluster - Mailing List Archive

Re: Light Reading - My thesis on Elastic Load Balancing

From: Joseph <Kyeong>
Date: Sun, 3 Apr 2011 18:25:18 +0100

Hi Rajat,

Great thank for your sharing the thesis.
As I said, I've been really looking forward to seeing this and now I have it!

Because here in the UK we will have 4-week Easter break from tomorrow,
I will have time to read your thesis.
Hopefully, I will get back to you with constructive feedback and comments.

Again, many thanks for your great contribution and sharing the thesis.


P.S. I found all your figures rather blurred, maybe as a result of the
use of bitmap format (e.g., PNG). You may consider vector graphic
format (e.g. PS/EPS) in this regard; MS Word has EPS import filter.

Kyeong Soo (Joseph) Kim, Ph.D.
Senior Lecturer in Networking
Room 112, Digital Technium
Multidisciplinary Nanotechnology Centre, College of Engineering
Swansea University, Singleton Park, Swansea SA2 8PP, Wales UK
TEL: +44 (0)1792 602024
HOME: (group)
On Sun, Apr 3, 2011 at 2:28 PM, Rajat Banerjee <> wrote:
> Hello list,
> Here is my thesis describing my work on Elastic Load Balancing in
> StarCluster. Many thanks to Justin Riley for his help in getting this done.
> The entire PDF is located at:
> It is 71 pages long.
> Here is the abstract:
> Abstract
> Computing in the cloud provides companies and colleges a new way to perform
> sophisticated computational tasks., Inc. (Amazon) is the leading
> provider of cloud infrastructure, and their solutions are used by thousands
> of companies, universities and individuals. Amazon’s service, dubbed Elastic
> Computing Cloud (EC2) allows users to rent servers by the hour, so that
> computing power can be increased and decreased as needed. It eliminates the
> need for companies to build and maintain expensive data centers. Instead
> customers can rent servers to perform tasks as needed, and turn them off
> when the tasks are completed.
> The ability to quickly add and remove computing capacity enables users to
> scale computing capacity in business and academic settings alike. When one
> needs to perform sophisticated calculations, process large data sets, or
> serve many concurrent clients, having more computing power improves
> throughput and responsiveness of the system. Tasks can be completed in less
> time and client requests can be served faster. In a traditional environment
> where a company or university builds and maintains every server in its data
> center, it takes days or even weeks to add new computing capacity, and costs
> a significant amount of money. Amazon EC2 allows for instant addition and
> removal of capacity, and their services are reasonably priced. A new server
> can be available in as little as five minutes and can then be terminated at
> any time. Server usage is billed by the hour, so users pay only for the
> hours they use. This flexibility, coupled with Amazon’s low prices, is a
> boon to anyone who needs to perform complex computational tasks for short or
> unpredictable time periods.
> The need for enormous amounts of computing power for short periods of time
> is a common characteristic of scientists performing High Performance
> Computing (HPC). HPC tasks are crucially important to modern science and can
> range from the modeling of microscopic molecular interactions in a protein
> to a nuclear weapon simulation. Before the availability of cloud computing
> resources, HPC users ran their computational tasks almost exclusively on
> very expensive supercomputers, which can cost in excess of $500 per hour and
> must be reserved ahead of time. These supercomputers are installed at many
> major universities, corporations, and research laboratories, but are not
> easily accessible because of their high cost. The recent installation of
> IBM’s Roadrunner supercomputer at Los Alamos National Laboratories in New
> Mexico cost over $133 million.
> With program decomposition techniques, scientists can break up seemingly
> intractable problems into smaller, more manageable subtasks that run
> independently. The problem can be solved by these extremely powerful
> supercomputers, which distribute the subtasks among the many discrete
> processors within the supercomputer. The processors have speedy
> communication channels between them that offer plenty of bandwidth. When
> discrete subtasks within the larger problem need to share information, such
> as the attractive charges emitted by a molecule in a protein folding
> simulation, that information is sent fast and frequently over the
> inter-processor communication links. Protein Folding simulations are
> particularly well suited toward parallelization because small parts of the
> molecule can be simulated independently, and then the individual results can
> be used to find the ideal structure of the complete protein. Parallelized
> problems like this can be solved by powerful, expensive supercomputers, or
> can be solved in a cluster of computers that are cheaper and more readily
> available. Some problems
> have unique requirements, like continuous single-threaded access to a
> high-powered processor, and those problems are out of the scope of this
> project.
> A project called StarCluster brings the flexibility and low cost of
> clustered, cloud computing to scientists and other users of High Performance
> Computers. Users can launch a cluster of Amazon EC2 servers, also called
> instances, through StarCluster and have a fully configured, ready to use
> computational cluster online in less than ten minutes, for as little as
> $0.08 per instance per hour. No reservations are required and a cluster of
> up to 20 machines can be launched at any time the user desires.
> StarCluster has made high performance computing in the cloud an affordable
> reality to many scientists who do not have access to expensive
> supercomputers. StarCluster, which is free, has approximately 500 users
> worldwide, most of whom are in academia. Using StarCluster incurs no
> additional fees beyond the nominal cost of per hour usage of EC2.
> StarCluster is a superb product for scientists who need supercomputing
> power, and who know how much time and computational resources they need to
> complete the tasks.
> Despite its many strengths, StarCluster does not easily adapt to changing
> workloads. This type of adaptability in the cloud is called elasticity. In
> StarCluster, when a cluster of instances is launched, the scientist must
> specify how many instances he or she wants. Those instances are launched
> together, and can only be terminated together. Instances cannot be
> terminated individually, even if one instance is idle. In some situations it
> is impossible to predict the workload of a cluster, such as when a scientist
> overestimates the duration of a task, or data processing runs faster than
> expected because an unexpected network upgrade transfers files faster. There
> are many reasons that a task could complete faster or slower than expected.
> It is a waste of money, in fees paid to Amazon, and a waste of energy, to
> keep many idle instances running indefinitely.
> This project, Elastic Load Balancing in EC2, aims to address this weakness
> in StarCluster by adding an Elastic Load Balancer to the project. The
> Elastic Load Balancer (ELB) will add instances to the cluster to improve job
> throughput when the cluster is heavily loaded, and terminate instances when
> they are idle to save money and energy. The ELB will periodically poll the
> cluster, analyze its workload, decide if the cluster needs to be modified,
> and add or remove instances. Through this process, StarCluster will maximize
> job throughput at busy times and save money at idle times.
> Several powerful Elastic Load Balancers are commercially available for Cloud
> and EC2 software setups, but StarCluster’s ELB is the only one specifically
> targeted toward the High Performance Computing domain. Existing ELB
> implementations are geared toward web server and application server
> environments and will be discussed in the Prior Work section. HPC jobs have
> a unique computing profile, have long running jobs and seldom serve external
> clients. This HPC computing profile mandates a new Elastic Load Balancing
> strategy.
> Any comments or questions are welcome. Best,
> Rajat
> _______________________________________________
> StarCluster mailing list
Received on Sun Apr 03 2011 - 13:25:19 EDT
This archive was generated by hypermail 2.3.0.


Sort all by: