StarCluster - Mailing List Archive

Re: SGE issue with hostnames

From: Justin Riley <no email>
Date: Tue, 26 Jul 2011 11:26:22 -0400

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Robert,

So after launching a test cluster using ami-a5c42dcc I can't seem to
reproduce the issue. The qhost, qconf -mq all.q, qstat, etc commands all
work fine for me and I've tried them several times. Also
/etc/nsswitch.conf looks fine.

Have you tried looking in the log file to see if there were errors in
setting up the hostname? You would also see the errors on the screen
while launching the cluster if that were the case. You can find the
debug file in /tmp/starcluster-debug-<username>.log.

Also have you tried restarting the cluster with the restart command?
This will completely reconfigure the cluster (including reinstalling SGE):

$ starcluster restart mycluster

Let me know if this fixes things for you.

~Justin


On 07/26/2011 11:02 AM, Justin Riley wrote:
> Hi Robert,
>
> Sorry to hear you're having issues. The 0.92rc2 version sets the
> (user-friendly) hostnames itself in /etc/hostname and /etc/hosts as
> you've seen. The only thing I can think of that would cause this
> issue is /etc/nsswitch.conf not preferring "files" before "dns" for
> hosts and networks. I'm launching a small test cluster to check but I
> would bet this is the problem.
>
> I can make a patch for this if this is the case. For your immediate
> needs, do you need the 9.04 AMI specifically? The latest AMI is
> 10.04 which should work fine. You can browse the latest available
> AMIs using the 'listpublic' command:
>
> $ starcluster listpublic
>
> Let me know,
>
> ~Justin
>
> On 07/25/2011 11:25 PM, Robert Tomkiewicz wrote:
>> Hi there,
>>
>> I started a 4-node EC2 cluster using 0.92rc2, and ami-a5c42dcc,
>> standard starcluster 9.04 x64 ami.
>>
>> I ran into the following issue while doing some basic sge setup.
>> At first qconf worked fine, then a few minutes later...
>>
>> root_at_master:/mnt/NAS1/simulations/CDI_0615_amazon/logs# qconf -sq
>> all.q error: commlib error: access denied (client IP resolved to
>> host name "domU-12-31-39-09-80-C1.compute-1.internal". This is not
>> identical to clients host name "master").
>>
>> after issuing
>>
>> root_at_master: ~# hostname master
>>
>> I was able to proceed normally, and launch my sge jobs. They were
>> running normally, confirmed by the output of qstat.
>>
>> However, some minutes later, when checking on them with another
>> qstat, I got the same thing again.
>>
>> root_at_master:/mnt/NAS1/simulations/CDI_0615_amazon/logs# qstat -f
>> error: commlib error: access denied (client IP resolved to host
>> name "domU-12-31-39-09-80-C1.compute-1.internal". This is not
>> identical to clients host name "master").
>>
>> resetting the hostname was to no avail.
>>
>> root_at_master: ~ # hostname master root_at_master: ~ # hostname master
>> root_at_master:/mnt/NAS1/simulations/CDI_0615_amazon/logs# qstat -f
>> error: commlib error: access denied (client IP resolved to host
>> name "domU-12-31-39-09-80-C1.compute-1.internal". This is not
>> identical to clients host name "master").
>>
>> So I tried this
>>
>> root_at_master: ~# hostname domU-12-31-39-09-80-C1.compute-1.internal
>>
>> which yielded, vice versa...
>>
>> root_at_master:/mnt/NAS1/simulations/CDI_0615_amazon/logs# qstat -f
>> error: commlib error: access denied (client IP resolved to host
>> name "master". This is not identical to clients host name
>> "domU-12-31-39-09-80-C1.compute-1.internal")
>>
>> Setting the hostname back to master "hostname master") at this
>> point yields correct operation for a few minutes.
>>
>>
>> It seems clear the problem has to do with doubled hostnames, but
>> where are they set? Has anyone else had a similar problem?
>>
>> Thank you,
>>
>> Robert Tomkiewicz
>>
>>
>>
>> /etc/hostname is simply
>>
>> master
>>
>>
>> /etc/hosts is below:
>>
>> 127.0.0.1 localhost
>>
>> # The following lines are desirable for IPv6 capable hosts ::1
>> ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0
>> ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters ff02::3
>> ip6-allhosts 10.210.135.47 master 10.66.83.219 node001
>> 10.193.155.175 node002 10.206.70.15 node003
>>
>>
>>
>>
>>
>>
>> _______________________________________________ StarCluster mailing
>> list StarCluster_at_mit.edu
>> http://mailman.mit.edu/mailman/listinfo/starcluster
>
> _______________________________________________ StarCluster mailing
> list StarCluster_at_mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk4u3J0ACgkQ4llAkMfDcrk+0QCfRzaSi0+C2hwM8Ac4NU3zIt93
Qg4AnRd/clZh5pITlPHslNwQFMmffooQ
=+FiL
-----END PGP SIGNATURE-----
Received on Tue Jul 26 2011 - 11:26:23 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject