Re: Starcluster start cluster stuck at Configuring passwordless ssh for root
This has happened every time I tried to start the cluster. It happens with both the 32- and 64-bit AMIs ami-899d49e0 and ami-999d49f0. Using restart successfully reboots the cluster but then gets stuck at the same point during configuration. I also created a new ssh key using the procedure in the quick-start, and reset the cluster template to use the newly generated ssh key.
[global]
DEFAULT_TEMPLATE=smallcluster
[aws info]
AWS_ACCESS_KEY_ID = ****
AWS_SECRET_ACCESS_KEY = ******
AWS_USER_ID=*-*-*
[key mykey]
KEY_LOCATION=~/.ssh/mykey.rsa
[cluster smallcluster]
KEYNAME = mykey
CLUSTER_SIZE = 2
CLUSTER_USER = sgeadmin
CLUSTER_SHELL = bash
NODE_IMAGE_ID = ami-999d49f0
NODE_INSTANCE_TYPE = m1.small
On Mar 30, 2012, at 4:27 PM, Justin Riley wrote:
> Hi Jonathan,
>
> Sorry you're having issues. Are you able to consistently reproduce this
> issue? Also which AMI are you using? I would also try the 'restart'
> command which will reboot all instances and reconfigure the cluster.
>
> ~Justin
>
> On Fri, Mar 30, 2012 at 04:19:01PM -0400, Jonathan Goodson wrote:
>> I followed the Quick-Start instructions at http://web.mit.edu/star/cluster/docs/latest/quickstart.html almost exactly, only changing a couple names. When I try to start the cluster everything works fine, it starts up two m1.small instances in my EC2 account, and successfully goes through the whole startup, until it gets to Configuring passwordless ssh for root, and then stops, and stays there indefinitely (I stopped it after 45 minutes and terminated the cluster)
>>
>> I don't even know where to start to try to figure out what is going on since it doesn't give me any error and I am not sure how to debug the script.
>>
>>
>> computer:~ jonathan$ starcluster start myfirstcluster
>> StarCluster - (http://web.mit.edu/starcluster) (v. 0.93.3)
>> Software Tools for Academics and Researchers (STAR)
>> Please submit bug reports to starcluster_at_mit.edu
>>
>>>>> Using default cluster template: smallcluster
>>>>> Validating cluster template settings...
>>>>> Cluster template settings are valid
>>>>> Starting cluster...
>>>>> Launching a 2-node cluster...
>>>>> Creating security group _at_sc-myfirstcluster...
>> Reservation:r-a95adbca
>>>>> Waiting for cluster to come up... (updating every 30s)
>>>>> Waiting for all nodes to be in a 'running' state...
>> 2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>>>> Waiting for SSH to come up on all nodes...
>> 2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>>>> Waiting for cluster to come up took 1.284 mins
>>>>> The master node is ec2-50-16-158-217.compute-1.amazonaws.com
>>>>> Setting up the cluster...
>>>>> Configuring hostnames...
>> 2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>>>> Creating cluster user: None (uid: 1001, gid: 1001)
>> 2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>>>> Configuring scratch space for user(s): sgeadmin
>> 2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>>>> Configuring /etc/hosts on each node
>> 2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>>>> Starting NFS server on master
>>>>> Configuring NFS exports path(s):
>> /home
>>>>> Mounting all NFS export path(s) on 1 worker node(s)
>> 1/1 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>>>> Setting up NFS took 0.078 mins
>>>>> Configuring passwordless ssh for root
>> _______________________________________________
>> StarCluster mailing list
>> StarCluster_at_mit.edu
>> http://mailman.mit.edu/mailman/listinfo/starcluster
Received on Mon Apr 02 2012 - 13:27:09 EDT
This archive was generated by
hypermail 2.3.0.