StarCluster - Mailing List Archive

Re: SGE issue with hostnames

From: Justin Riley <no email>
Date: Tue, 26 Jul 2011 11:02:27 -0400

Hi Robert,

Sorry to hear you're having issues. The 0.92rc2 version sets the
(user-friendly) hostnames itself in /etc/hostname and /etc/hosts as
you've seen. The only thing I can think of that would cause this issue
is /etc/nsswitch.conf not preferring "files" before "dns" for hosts and
networks. I'm launching a small test cluster to check but I would bet
this is the problem.

I can make a patch for this if this is the case. For your immediate
needs, do you need the 9.04 AMI specifically? The latest AMI is 10.04
which should work fine. You can browse the latest available AMIs using
the 'listpublic' command:

$ starcluster listpublic

Let me know,

~Justin

On 07/25/2011 11:25 PM, Robert Tomkiewicz wrote:
> Hi there,
>
> I started a 4-node EC2 cluster using 0.92rc2, and ami-a5c42dcc, standard
> starcluster 9.04 x64 ami.
>
> I ran into the following issue while doing some basic sge setup. At
> first qconf worked fine, then a few minutes later...
>
> root_at_master:/mnt/NAS1/simulations/CDI_0615_amazon/logs# qconf -sq all.q
> error: commlib error: access denied (client IP resolved to host name
> "domU-12-31-39-09-80-C1.compute-1.internal". This is not identical to
> clients host name "master").
>
> after issuing
>
> root_at_master: ~# hostname master
>
> I was able to proceed normally, and launch my sge jobs. They were
> running normally, confirmed by the output of qstat.
>
> However, some minutes later, when checking on them with another qstat, I
> got the same thing again.
>
> root_at_master:/mnt/NAS1/simulations/CDI_0615_amazon/logs# qstat -f
> error: commlib error: access denied (client IP resolved to host name
> "domU-12-31-39-09-80-C1.compute-1.internal". This is not identical to
> clients host name "master").
>
> resetting the hostname was to no avail.
>
> root_at_master: ~ # hostname
> master
> root_at_master: ~ # hostname master
> root_at_master:/mnt/NAS1/simulations/CDI_0615_amazon/logs# qstat -f
> error: commlib error: access denied (client IP resolved to host name
> "domU-12-31-39-09-80-C1.compute-1.internal". This is not identical to
> clients host name "master").
>
> So I tried this
>
> root_at_master: ~# hostname domU-12-31-39-09-80-C1.compute-1.internal
>
> which yielded, vice versa...
>
> root_at_master:/mnt/NAS1/simulations/CDI_0615_amazon/logs# qstat -f
> error: commlib error: access denied (client IP resolved to host name
> "master". This is not identical to clients host name
> "domU-12-31-39-09-80-C1.compute-1.internal")
>
> Setting the hostname back to master "hostname master") at this point
> yields correct operation for a few minutes.
>
>
> It seems clear the problem has to do with doubled hostnames, but where
> are they set? Has anyone else had a similar problem?
>
> Thank you,
>
> Robert Tomkiewicz
>
>
>
> /etc/hostname is simply
>
> master
>
>
> /etc/hosts is below:
>
> 127.0.0.1 localhost
>
> # The following lines are desirable for IPv6 capable hosts
> ::1 ip6-localhost ip6-loopback
> fe00::0 ip6-localnet
> ff00::0 ip6-mcastprefix
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
> ff02::3 ip6-allhosts
> 10.210.135.47 master
> 10.66.83.219 node001
> 10.193.155.175 node002
> 10.206.70.15 node003
>
>
>
>
>
>
> _______________________________________________
> StarCluster mailing list
> StarCluster_at_mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster
Received on Tue Jul 26 2011 - 11:02:31 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject