StarCluster - Mailing List Archive

Re: grid engine not initialize on gpu hvm image

From: Jesse Lu <no email>
Date: Thu, 13 Sep 2012 10:32:43 -0700

Thanks for the pointer Justin!

On Thu, Sep 13, 2012 at 10:29 AM, Justin Riley <jtriley_at_mit.edu> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi Jesse,
>
> Sorry for the delay in responding but glad you figured out to use
> all-Ubuntu AMIs for both HVM and non-HVM nodes. With that said keep in
> mind that only HVM nodes are on the high speed network IIRC which means
> all traffic between master and nodes (e.g. NFS) will be suboptimal
> compared to the performance of an all HVM cluster.
>
> ~Justin
>
>
> On 08/27/2012 05:59 PM, Jesse Lu wrote:
> > Okay, figured out that using ami-999d49f0 for non-HVM master and
> > ami-4583572c for HVM nodes makes SGE work well. It's my fault for
> > not looking at the available public starcluster images carefully
> > enough.
> >
> >
> >
> > On Mon, Aug 27, 2012 at 2:26 PM, Jesse Lu <jesselu_at_stanford.edu
> > <mailto:jesselu_at_stanford.edu>> wrote:
> >
> > Sorry for the spam, but here's another follow-up.
> >
> > I found that this only happens when I use a non HVM-EBS AMI for
> > the master, but an HWM-EBS for the master.
> >
> > This is probably because StarCluster copies the sge install from
> > the master to the nodes, and this doesn't play nice when the nodes
> > are CentOS based but the master is Ubuntu based.
> >
> > Any ideas for a work-around?
> >
> >
> > On Mon, Aug 27, 2012 at 2:07 PM, Jesse Lu <jesselu_at_stanford.edu
> > <mailto:jesselu_at_stanford.edu>> wrote:
> >
> > Follow-up,
> >
> > Here are the contents of the installation log file (for grid
> > engine)
> >
> > cat
> >
> /opt/sge6/default/common/install_logs/execd_install_node001_2012-08-27_14:04:29.log
> >
> >
> >
> > Your $SGE_ROOT directory: /opt/sge6
> >
> >
> > Using cell: >default<
> >
> >
> >
> >
> >
> > Using local execd spool directory
> > [/opt/sge6/default/spool/exec_spool_local]
> >
> > Creating local configuration for host >node001< sgeadmin_at_node001
> > modified "node001" in configuration list Local configuration for
> > host >node001< created.
> >
> > Host >master< already in submit host list! Host >node001< already
> > in submit host list!
> >
> >
> > starting sge_execd
> >
> >
> > No modification because "node001" already exists in "hostlist" of
> > "hostgroup" root_at_node001 modified "_at_allhosts" in host group list
> > root_at_node001 modified "all.q" in cluster queue list
> >
> > got select error: Connection refused got select error: closing
> > "node001/execd/1" Execd on host node001 is not started!
> >
> >
> > On Mon, Aug 27, 2012 at 1:37 PM, Jesse Lu <jesselu_at_stanford.edu
> > <mailto:jesselu_at_stanford.edu>> wrote:
> >
> > ami-12b6477b produces the folowing error on cluster startup
> >
> > !!! ERROR - command 'cd /opt/sge6 && TERM=rxvt ./inst_sge -x
> > -noremote -auto ./ec2_sge.conf' failed with status 1
> >
> > I'm guessing the sge6 installation is faulty? Can anyone help?
> > Thanks!
> >
> > Jesse
> >
> >
> >
> >
> >
> >
> > _______________________________________________ StarCluster mailing
> > list StarCluster_at_mit.edu
> > http://mailman.mit.edu/mailman/listinfo/starcluster
> >
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.19 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAlBSF/4ACgkQ4llAkMfDcrlSwwCbB5lJLmj4GY9rriY9jfxNdqO3
> s2UAn13+cEYu9bCqx6jiAP/wuPdetm+D
> =Dyis
> -----END PGP SIGNATURE-----
>
Received on Thu Sep 13 2012 - 13:32:45 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject