StarCluster - Mailing List Archive

Re: SGE issue with hostnames

From: Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D. <"Hung-Sheng>
Date: Tue, 26 Jul 2011 08:37:12 -0400

in the $SGE_ROOT/$SGE_CELL/common
create host_aliases
< hostname>.privatenet <hostnam>.pubnet


On 7/25/2011 11:25 PM, Robert Tomkiewicz wrote:
> Hi there,
>
> I started a 4-node EC2 cluster using 0.92rc2, and ami-a5c42dcc,
> standard starcluster 9.04 x64 ami.
>
> I ran into the following issue while doing some basic sge setup. At
> first qconf worked fine, then a few minutes later...
>
> root_at_master:/mnt/NAS1/simulations/CDI_0615_amazon/logs# qconf -sq all.q
> error: commlib error: access denied (client IP resolved to host name
> "domU-12-31-39-09-80-C1.compute-1.internal". This is not identical to
> clients host name "master").
>
> after issuing
>
> root_at_master: ~# hostname master
>
> I was able to proceed normally, and launch my sge jobs. They were
> running normally, confirmed by the output of qstat.
>
> However, some minutes later, when checking on them with another qstat,
> I got the same thing again.
>
> root_at_master:/mnt/NAS1/simulations/CDI_0615_amazon/logs# qstat -f
> error: commlib error: access denied (client IP resolved to host name
> "domU-12-31-39-09-80-C1.compute-1.internal". This is not identical to
> clients host name "master").
>
> resetting the hostname was to no avail.
>
> root_at_master: ~ # hostname
> master
> root_at_master: ~ # hostname master
> root_at_master:/mnt/NAS1/simulations/CDI_0615_amazon/logs# qstat -f
> error: commlib error: access denied (client IP resolved to host name
> "domU-12-31-39-09-80-C1.compute-1.internal". This is not identical to
> clients host name "master").
>
> So I tried this
>
> root_at_master: ~# hostname domU-12-31-39-09-80-C1.compute-1.internal
>
> which yielded, vice versa...
>
> root_at_master:/mnt/NAS1/simulations/CDI_0615_amazon/logs# qstat -f
> error: commlib error: access denied (client IP resolved to host name
> "master". This is not identical to clients host name
> "domU-12-31-39-09-80-C1.compute-1.internal")
>
> Setting the hostname back to master "hostname master") at this point
> yields correct operation for a few minutes.
>
>
> It seems clear the problem has to do with doubled hostnames, but where
> are they set? Has anyone else had a similar problem?
>
> Thank you,
>
> Robert Tomkiewicz
>
>
>
> /etc/hostname is simply
>
> master
>
>
> /etc/hosts is below:
>
> 127.0.0.1 localhost
>
> # The following lines are desirable for IPv6 capable hosts
> ::1 ip6-localhost ip6-loopback
> fe00::0 ip6-localnet
> ff00::0 ip6-mcastprefix
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
> ff02::3 ip6-allhosts
> 10.210.135.47 master
> 10.66.83.219 node001
> 10.193.155.175 node002
> 10.206.70.15 node003
>
>
>
>
>
>
> _______________________________________________
> StarCluster mailing list
> StarCluster_at_mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster




Received on Tue Jul 26 2011 - 08:37:21 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject