Hi Rajat,
This is to report one strange behaviour I just encountered during the
use of StarCluster with your loadbalancer (LB).
Below is the snippet of "/etc/hosts" file:
.......
10.76.91.4 ip-10-76-91-4.ec2.internal ip-10-76-91-4 node016
10.112.209.34 ip-10-112-209-34.ec2.internal ip-10-112-209-34 node016
.....
My cluster initially started with 10 nodes and I ran LB with maximum
node set to 20.
It seems that, in the middle of adding new nodes to the cluster, the
LB added a new node with a duplicate host name (i.e. node016) for
unknown reasons; I could see the 2nd "node016" instance through the
AWS mgmt. console, but found that it was not used by the SGE.
Fortunately, manually terminating the node through the console didn't
affect the SGE and the already running jobs.
With Regards,
Joseph
--
Kyeong Soo (Joseph) Kim, Ph.D.
Senior Lecturer in Networking
Room 112, Digital Technium
Multidisciplinary Nanotechnology Centre, College of Engineering
Swansea University, Singleton Park, Swansea SA2 8PP, Wales UK
TEL: +44 (0)1792 602024
EMAIL: k.s.kim_at_swansea.ac.uk
HOME: http://iat-hnrl.swan.ac.uk/ (group)
http://iat-hnrl.swan.ac.uk/~kks/ (personal)
On Mon, Jan 31, 2011 at 9:09 AM, Kyeong Soo (Joseph) Kim
<kyeongsoo.kim_at_gmail.com> wrote:
> Hello Rajat,
> I am very interested in your work on the elastic load balancing; I do
> remember that you posted some graphs on early results in the past and that
> you were working on your MSc thesis.
> In fact, this new feature will be critical for my current research requiring
> about 3~400 independent simulation runs and I do highly appreciate your
> great contribution to the StarCluster.
> By the way, I wonder whether you have published your work in any
> conferences/journals yet.
> Regards,
> Joseph
> --
> Kyeong Soo (Joseph) Kim, Ph.D.
> Senior Lecturer in Networking
> Room 112, Digital Technium
> Multidisciplinary Nanotechnology Centre, College of Engineering
> Swansea University, Singleton Park, Swansea SA2 8PP, Wales UK
> TEL: +44 (0)1792 602024
> EMAIL: k.s.kim_at_swansea.ac.uk
> HOME: http://iat-hnrl.swan.ac.uk/ (group)
> http://iat-hnrl.swan.ac.uk/~kks/ (personal)
>
>
> On Fri, Jan 28, 2011 at 6:31 PM, Rajat Banerjee <rbanerj_at_fas.harvard.edu>
> wrote:
>>
>> Hi Archie,
>> Yes, there is ELB built into the latest releases of StarCluster. I wrote
>> it, so feel free to write me (+ the list) with any questions.
>> The docs on
>> http://web.mit.edu/stardev/cluster/docs/index.html
>> haven't been updated in a while. There is a documentation page on
>> starcluster in the code base, see
>> /starcluster/StarCluster/docs/sphinx/load_balancer.rst
>> That doc should have all of the information you need, and is readable in
>> plain text.
>> Typically, this is how I fire up the load balancer:
>> starcluster bal <cluster_tag> -m <MAX_NODES you want> -n <MIN_NODES you
>> want>
>> It will poll the cluster every 60 seconds and make decisions. The
>> decisions are described in load_balancer.rst. There is a visualizer which
>> makes 6 graphs with matplotlib to show you how many nodes are working, how
>> many jobs are running, queued, avg load, etc, but the visualizer still needs
>> a little bit of work.
>> Hope that helps, and feel free to send back questions.
>> Rajat Banerjee
>>
>> On Fri, Jan 28, 2011 at 12:29 PM, <starcluster-request_at_mit.edu> wrote:
>>>
>>> Send StarCluster mailing list submissions to
>>> starcluster_at_mit.edu
>>>
>>> To subscribe or unsubscribe via the World Wide Web, visit
>>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>> or, via email, send a message with subject or body 'help' to
>>> starcluster-request_at_mit.edu
>>>
>>> You can reach the person managing the list at
>>> starcluster-owner_at_mit.edu
>>>
>>> When replying, please edit your Subject line so it is more specific
>>> than "Re: Contents of StarCluster digest..."
>>>
>>> Today's Topics:
>>>
>>> 1. Starcluster and elastic load balancing (Archie Russell)
>>>
>>>
>>> ---------- Forwarded message ----------
>>> From: Archie Russell <archier_at_gmail.com>
>>> To: starcluster_at_mit.edu
>>> Date: Thu, 27 Jan 2011 11:40:00 -0800
>>> Subject: [StarCluster] Starcluster and elastic load balancing
>>>
>>> Hi,
>>> Online it says Starcluster has Elastic Load Balancing built into the
>>> latest code
>>> version at Github. How would I go about using this? How does
>>> it work, e.g.
>>> when does it fire up new nodes and when does it shut them down?
>>> Thanks,
>>> Archie
>>> _______________________________________________
>>> StarCluster mailing list
>>> StarCluster_at_mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>>
>>
>>
>>
>> _______________________________________________
>> StarCluster mailing list
>> StarCluster_at_mit.edu
>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>
>
>
Received on Tue Mar 15 2011 - 13:27:16 EDT