StarCluster - Mailing List Archive

Re: SGE issue with hostnames

From: Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D. <"Hung-Sheng>
Date: Tue, 26 Jul 2011 08:38:19 -0400

sorry
restart master and all execd
this host_aliases should be in all nodes



On 7/26/2011 8:37 AM, "Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D." wrote:
> in the $SGE_ROOT/$SGE_CELL/common
> create host_aliases
> < hostname>.privatenet <hostnam>.pubnet
>
>
> On 7/25/2011 11:25 PM, Robert Tomkiewicz wrote:
>> Hi there,
>>
>> I started a 4-node EC2 cluster using 0.92rc2, and ami-a5c42dcc,
>> standard starcluster 9.04 x64 ami.
>>
>> I ran into the following issue while doing some basic sge setup. At
>> first qconf worked fine, then a few minutes later...
>>
>> root_at_master:/mnt/NAS1/simulations/CDI_0615_amazon/logs# qconf -sq all.q
>> error: commlib error: access denied (client IP resolved to host name
>> "domU-12-31-39-09-80-C1.compute-1.internal". This is not identical to
>> clients host name "master").
>>
>> after issuing
>>
>> root_at_master: ~# hostname master
>>
>> I was able to proceed normally, and launch my sge jobs. They were
>> running normally, confirmed by the output of qstat.
>>
>> However, some minutes later, when checking on them with another
>> qstat, I got the same thing again.
>>
>> root_at_master:/mnt/NAS1/simulations/CDI_0615_amazon/logs# qstat -f
>> error: commlib error: access denied (client IP resolved to host name
>> "domU-12-31-39-09-80-C1.compute-1.internal". This is not identical to
>> clients host name "master").
>>
>> resetting the hostname was to no avail.
>>
>> root_at_master: ~ # hostname
>> master
>> root_at_master: ~ # hostname master
>> root_at_master:/mnt/NAS1/simulations/CDI_0615_amazon/logs# qstat -f
>> error: commlib error: access denied (client IP resolved to host name
>> "domU-12-31-39-09-80-C1.compute-1.internal". This is not identical to
>> clients host name "master").
>>
>> So I tried this
>>
>> root_at_master: ~# hostname domU-12-31-39-09-80-C1.compute-1.internal
>>
>> which yielded, vice versa...
>>
>> root_at_master:/mnt/NAS1/simulations/CDI_0615_amazon/logs# qstat -f
>> error: commlib error: access denied (client IP resolved to host name
>> "master". This is not identical to clients host name
>> "domU-12-31-39-09-80-C1.compute-1.internal")
>>
>> Setting the hostname back to master "hostname master") at this point
>> yields correct operation for a few minutes.
>>
>>
>> It seems clear the problem has to do with doubled hostnames, but
>> where are they set? Has anyone else had a similar problem?
>>
>> Thank you,
>>
>> Robert Tomkiewicz
>>
>>
>>
>> /etc/hostname is simply
>>
>> master
>>
>>
>> /etc/hosts is below:
>>
>> 127.0.0.1 localhost
>>
>> # The following lines are desirable for IPv6 capable hosts
>> ::1 ip6-localhost ip6-loopback
>> fe00::0 ip6-localnet
>> ff00::0 ip6-mcastprefix
>> ff02::1 ip6-allnodes
>> ff02::2 ip6-allrouters
>> ff02::3 ip6-allhosts
>> 10.210.135.47 master
>> 10.66.83.219 node001
>> 10.193.155.175 node002
>> 10.206.70.15 node003
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> StarCluster mailing list
>> StarCluster_at_mit.edu
>> http://mailman.mit.edu/mailman/listinfo/starcluster




Received on Tue Jul 26 2011 - 08:38:28 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject