Adding and Removing Nodes

StarCluster has support for manually shrinking and expanding the size of your cluster based on your resource needs. For example, you might start out with 10-nodes and realize that you only really need 5 or the reverse case where you start 5 nodes and find out you need 10. In these cases you can use StarCluster’s addnode and removenode commands to scale the size of your cluster to your needs.

Note

The examples below assume we have a 1-node cluster running called mycluster.

Adding Nodes

To add nodes to a running cluster use the addnode command. This command takes a cluster tag as an argument and will automatically add a new node to the cluster:

$ starcluster addnode mycluster
StarCluster - (http://web.mit.edu/starcluster)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu

>>> Launching node(s): node001
>>> Waiting for node(s) to come up... (updating every 30s)
>>> Waiting for open spot requests to become active...
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Waiting for all nodes to be in a 'running' state...
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Waiting for SSH to come up on all nodes...
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Configuring hostnames...
1/1 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Configuring /etc/hosts on each node
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Configuring NFS...
>>> Mounting shares for node node001
1/1 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Configuring scratch space for user: myuser
1/1 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Configuring passwordless ssh for root
>>> Using existing key: /root/.ssh/id_rsa
>>> Configuring passwordless ssh for myuser
>>> Using existing key: /home/myuser/.ssh/id_rsa
>>> Adding node001 to SGE

The addnode command auto-generates an alias for the new node(s). In the above example mycluster is a single node cluster. In this case addnode automatically added a new node and gave it an alias of node001. If we added additional nodes they would be named node002, node003, and so on.

If you’d rather manually specify an alias for the new node(s) use the --alias (-a) option:

$ starcluster addnode -a mynewnode mycluster

It is also possible to add multiple nodes using the --num-nodes (-n) option:

$ starcluster addnode -n 5 mycluster

The above command will add five additional nodes to mycluster auto-generating the node aliases. To specify aliases for all five nodes simply specify a comma separated list to the -a option:

$ starcluster addnode -n 5 -a n1,n2,n3,n4,n5 mycluster

Once the addnode command has completed successfully the new nodes will show up in the output of the listclusters command:

$ starcluster listclusters mycluster

You can login directly to a new node by alias:

$ starcluster sshnode mycluster mynewnode

The addnode command has additional options for customizing the new node’s instance type, AMI, spot bid, and more. See the help menu for a detailed list of all available options:

$ starcluster addnode --help

Re-adding a Node

If you’ve previously attempted to add a node and it failed due to a plugin error or other bug or if you used the removenode command with the -k option and wish to re-add the node to the cluster without launching a new instance you can use the -x option:

$ starcluster addnode -x -a node001 mycluster

Note

The -x option requires the -a option

This will attempt to add or re-add node001 to mycluster using the existing instance rather than launching a new instance. If no instance exists with the alias specified by the -a option an error is reported. You can also do this for multiple nodes:

$ starcluster addnode -x -a mynode1,mynode2,mynode3 mycluster

Removing Nodes

To remove nodes from an existing cluster use the removenode command. This command takes at least two arguments: the cluster tag representing the cluster you want to remove nodes from and a node alias:

$ starcluster removenode mycluster node001
StarCluster - (http://web.mit.edu/starcluster)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu

>>> Removing node node001 (i-8bec7ce5)...
>>> Removing node001 from SGE
>>> Removing node001 from known_hosts files
>>> Removing node001 from /etc/hosts
>>> Removing node001 from NFS
>>> Canceling spot request sir-3567ba14
>>> Terminating node: node001 (i-8bec7ce5)

The above command removes node001 from mycluster by removing the node from the Sun Grid Engine queuing system, from each node’s ssh known_hosts files, from each node’s /etc/hosts file, and from all NFS shares. If you’re using plugins with your cluster they will be called to remove the node. Once the node has been removed from the cluster the node is terminated. If the node is a spot instance, as it is in the above example, the spot instance request will also be cancelled.

You can also remove multiple nodes by providing a list of aliases:

$ starcluster removenode mycluster node001 node002 node003

Remove Without Terminating

If you’d rather not terminate the node(s) after removing from the cluster to test plugins, for example, use the --keep-instance (-k) option:

$ starcluster removenode -k mycluster node001 node002 node003

This will remove the nodes from the cluster but leave the instances running. This can be useful, for example, when testing on_add_node methods in a StarCluster plugin.

Table Of Contents

Previous topic

Plugin System

Next topic

Elastic Load Balancer