StarCluster - Mailing List Archive

Re: [Starcluster] Starcluster hangs at Creating Cluster User

From: Justin Riley <no email>
Date: Fri, 16 Apr 2010 09:59:58 -0400

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Dan,

Just following up for others who might be interested. This is due to
StarCluster chmod'ing recursively all mounted EBS volumes to be owned by
cluster_user. Unfortunately in the case of many GB of data, this takes
*forever*.

I'm working on a solution to this.

~Justin

On 04/15/2010 02:18 PM, Dan Yamins wrote:
> Hi,
>
> I'm using Starcluster from the git repo. I think I have everything
> configured properly. But when I try to a 1-node cluster, the process
> hangs at the "create user" step:
>
>>>> Validating cluster settings...
>>>> Cluster settings are valid
>>>> Starting cluster...
>>>> Launching a 1-node cluster...
>>>> Launching master node...
>>>> Master AMI: ami-a19e71c8
>>>> Creating security group _at_sc-testcluster...
> Reservation:r-56c3ca3e
>>>> Waiting for cluster to start.../>>> The master node is
> ec2-184-73-33-230.compute-1.amazonaws.com
> <http://ec2-184-73-33-230.compute-1.amazonaws.com>
>
>>>> Attaching volume vol-c3d927aa to master node...
>>>> Setting up the cluster...
>>>> Mounting EBS volume vol-c3d927aa on /home...
>>>> Using private key /Users/danielyamins/amazon/id_rsa-gsg-keypair (rsa)
>>>> Creating cluster user: gotdata
>
> ... and that's where it hangs.
>
> I CAN log into the individual nodes -- both as master AND as "gotdata"
> -- using passwordless ssh. Here's what the /etc/hosts file looks like:
>
> 127.0.0.1 localhost.localdomain localhost
>
> # The following lines are desirable for IPv6 capable hosts
> ::1 ip6-localhost ip6-loopback
> fe00::0 ip6-localnet
> ff00::0 ip6-mcastprefix
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
> ff02::3 ip6-allhosts
>
> Since this is a 1-node cluster, I can't test the passwordless login.
>
> I can reproduce this problem both with both the 32-bit and 64-bit base
> starcluster AMIs as well as the AMIs that I created from those.
>
> When I try to create a 2-node cluster, the process hangs a step later:
>
>>>> Validating cluster settings...
>>>> Cluster settings are valid
>>>> Starting cluster...
>>>> Launching a 2-node cluster...
>>>> Launching master node...
>>>> Master AMI: ami-f129c798
>>>> Creating security group _at_sc-testcluster...
> Reservation:r-e8d9d080
>>>> Launching worker nodes...
>>>> Node AMI: ami-f129c798
> Reservation:r-ead9d082
>>>> Waiting for cluster to start...
>>>> The master node is ec2-184-73-111-239.compute-1.amazonaws.com
> <http://ec2-184-73-111-239.compute-1.amazonaws.com>
>>>> Attaching volume vol-c3d927aa to master node...
>>>> Setting up the cluster...
>>>> Mounting EBS volume vol-c3d927aa on /home...
>>>> Using private key /Users/danielyamins/amazon/id_rsa-gsg-keypair (rsa)
>>>> Creating cluster user: gotdata
>>>> Using private key /Users/danielyamins/amazon/id_rsa-gsg-keypair (rsa)
>
> .... and there it hangs.
>
> In this case, I can:
> -- log into the master and worker nodes as root: e.g. "starcluster
> sshmaster testcluster" and "starcluster sshnode testcluster 1" work fine
> -- log into the master as user gotdata, but NOT into the other worker
> node, e.g. "starcluster sshnode -u gotdata testcluster 0" works but
> "starclsuter sshnode -u gotdata testcluster 1" DOESN'T.
>
>
> Thanks!
> Dan
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> Starcluster mailing list
> Starcluster_at_mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkvIbV4ACgkQ4llAkMfDcrn0UACfaMwr2DJ+vqvQXwZvHnTp3EJF
OmMAn1jp+ySTlRUftkZRarEEiig9ZxMo
=Wy36
-----END PGP SIGNATURE-----
Received on Fri Apr 16 2010 - 10:00:00 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject