StarCluster - Mailing List Archive

Re: Starcluster and elastic load balancing

From: Joseph <Kyeong>
Date: Mon, 11 Apr 2011 17:16:59 +0100

Justin,

Is this related to the ssh_keyscan problem which you mentioned in
another thread (regarding the node scalability)?

Anyhow, I will give it a try with the latest code as soon as I am
ready for another round of simulation.

Regards,
Joseph

On Wed, Apr 6, 2011 at 4:13 PM, Justin Riley <justin.t.riley_at_gmail.com> wrote:
> Hi Joseph,
>
> That's strange. This problem could be related to the fact that nodes
> were failing to be added to SGE which might have thrown the load
> balancer logic off... In any event would you mind testing the latest
> code with the load balancer and see if this happens again?
>
> Thanks!
>
> ~Justin
>
> On Tue, Mar 15, 2011 at 1:27 PM, Kyeong Soo (Joseph) Kim
> <kyeongsoo.kim_at_gmail.com> wrote:
>> Hi Rajat,
>>
>> This is to report one strange behaviour I just encountered during the
>> use of StarCluster with your loadbalancer (LB).
>> Below is the snippet of "/etc/hosts" file:
>>
>> .......
>> 10.76.91.4 ip-10-76-91-4.ec2.internal ip-10-76-91-4 node016
>> 10.112.209.34 ip-10-112-209-34.ec2.internal ip-10-112-209-34 node016
>> .....
>>
>> My cluster initially started with 10 nodes and I ran LB with maximum
>> node set to 20.
>>
>> It seems that, in the middle of adding new nodes to the cluster, the
>> LB added a new node with a duplicate host name (i.e. node016) for
>> unknown reasons; I could see the 2nd "node016" instance through the
>> AWS mgmt. console, but found that it was not used by the SGE.
>> Fortunately, manually terminating the node through the console didn't
>> affect the SGE and the already running jobs.
>>
>> With Regards,
>> Joseph
>> --
>> Kyeong Soo (Joseph) Kim, Ph.D.
>> Senior Lecturer in Networking
>> Room 112, Digital Technium
>> Multidisciplinary Nanotechnology Centre, College of Engineering
>> Swansea University, Singleton Park, Swansea SA2 8PP, Wales UK
>> TEL: +44 (0)1792 602024
>> EMAIL: k.s.kim_at_swansea.ac.uk
>> HOME: http://iat-hnrl.swan.ac.uk/ (group)
>>             http://iat-hnrl.swan.ac.uk/~kks/ (personal)
>>
>>
>> On Mon, Jan 31, 2011 at 9:09 AM, Kyeong Soo (Joseph) Kim
>> <kyeongsoo.kim_at_gmail.com> wrote:
>>> Hello Rajat,
>>> I am very interested in your work on the elastic load balancing; I do
>>> remember that you posted some graphs on early results in the past and that
>>> you were working on your MSc thesis.
>>> In fact, this new feature will be critical for my current research requiring
>>> about 3~400 independent simulation runs and I do highly appreciate your
>>> great contribution to the StarCluster.
>>> By the way, I wonder whether you have published your work in any
>>> conferences/journals yet.
>>> Regards,
>>> Joseph
>>> --
>>> Kyeong Soo (Joseph) Kim, Ph.D.
>>> Senior Lecturer in Networking
>>> Room 112, Digital Technium
>>> Multidisciplinary Nanotechnology Centre, College of Engineering
>>> Swansea University, Singleton Park, Swansea SA2 8PP, Wales UK
>>> TEL: +44 (0)1792 602024
>>> EMAIL: k.s.kim_at_swansea.ac.uk
>>> HOME: http://iat-hnrl.swan.ac.uk/ (group)
>>>             http://iat-hnrl.swan.ac.uk/~kks/ (personal)
>>>
>>>
>>> On Fri, Jan 28, 2011 at 6:31 PM, Rajat Banerjee <rbanerj_at_fas.harvard.edu>
>>> wrote:
>>>>
>>>> Hi Archie,
>>>> Yes, there is ELB built into the latest releases of StarCluster. I wrote
>>>> it, so feel free to write me (+ the list) with any questions.
>>>> The docs on
>>>> http://web.mit.edu/stardev/cluster/docs/index.html
>>>> haven't been updated in a while. There is a documentation page on
>>>> starcluster in the code base, see
>>>> /starcluster/StarCluster/docs/sphinx/load_balancer.rst
>>>> That doc should have all of the information you need, and is readable in
>>>> plain text.
>>>> Typically, this is how I fire up the load balancer:
>>>> starcluster bal <cluster_tag> -m <MAX_NODES you want> -n <MIN_NODES you
>>>> want>
>>>> It will poll the cluster every 60 seconds and make decisions. The
>>>> decisions are described in load_balancer.rst. There is a visualizer which
>>>> makes 6 graphs with matplotlib to show you how many nodes are working, how
>>>> many jobs are running, queued, avg load, etc, but the visualizer still needs
>>>> a little bit of work.
>>>> Hope that helps, and feel free to send back questions.
>>>> Rajat Banerjee
>>>>
>>>> On Fri, Jan 28, 2011 at 12:29 PM, <starcluster-request_at_mit.edu> wrote:
>>>>>
>>>>> Send StarCluster mailing list submissions to
>>>>>        starcluster_at_mit.edu
>>>>>
>>>>> To subscribe or unsubscribe via the World Wide Web, visit
>>>>>        http://mailman.mit.edu/mailman/listinfo/starcluster
>>>>> or, via email, send a message with subject or body 'help' to
>>>>>        starcluster-request_at_mit.edu
>>>>>
>>>>> You can reach the person managing the list at
>>>>>        starcluster-owner_at_mit.edu
>>>>>
>>>>> When replying, please edit your Subject line so it is more specific
>>>>> than "Re: Contents of StarCluster digest..."
>>>>>
>>>>> Today's Topics:
>>>>>
>>>>>   1. Starcluster and elastic load balancing (Archie Russell)
>>>>>
>>>>>
>>>>> ---------- Forwarded message ----------
>>>>> From: Archie Russell <archier_at_gmail.com>
>>>>> To: starcluster_at_mit.edu
>>>>> Date: Thu, 27 Jan 2011 11:40:00 -0800
>>>>> Subject: [StarCluster] Starcluster and elastic load balancing
>>>>>
>>>>> Hi,
>>>>> Online it says Starcluster has Elastic Load Balancing built into the
>>>>> latest code
>>>>> version at Github.     How would I go about using this?     How does
>>>>> it work,  e.g.
>>>>> when does it fire up new nodes and when does it shut them down?
>>>>> Thanks,
>>>>> Archie
>>>>> _______________________________________________
>>>>> StarCluster mailing list
>>>>> StarCluster_at_mit.edu
>>>>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> StarCluster mailing list
>>>> StarCluster_at_mit.edu
>>>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>>>
>>>
>>>
>>
>> _______________________________________________
>> StarCluster mailing list
>> StarCluster_at_mit.edu
>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>
>
Received on Mon Apr 11 2011 - 12:17:01 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject