StarCluster - Mailing List Archive

Fwd: StarCluster Digest, Vol 44, Issue 5

From: Sergio Mafra <no email>
Date: Tue, 7 May 2013 16:30:29 -0300

Hi fellows,

Any help on this.

All the best,

Sergio

---------- Forwarded message ----------
From: Sergio Mafra <sergiohmafra_at_gmail.com>
Date: Tue, May 7, 2013 at 4:17 PM
Subject: Re: [StarCluster] StarCluster Digest, Vol 44, Issue 5
To: Rajat Banerjee <rajatb_at_post.harvard.edu>


Hi Rajat,

I think that the date problem was over. Now we´ve got a new one. Check it
out:

ubuntu_at_domU-12-31-39-02-19-36:~$ starcluster loadbalance spotcluster
StarCluster - (http://star.mit.edu/cluster) (v. 0.9999)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster_at_mit.edu

>>> Starting load balancer (Use ctrl-c to exit)
Maximum cluster size: 5
Minimum cluster size: 1
Cluster growth rate: 1 nodes/iteration

>>> Loading full job history
*** WARNING - Failed to retrieve stats (1/5):
Traceback (most recent call last):
  File
"/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/balancers/sge/__init__.py",
line 515, in get_stats
    self.stat = self._get_stats()
  File
"/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/balancers/sge/__init__.py",
line 493, in _get_stats
    qacct = '\n'.join(master.ssh.execute(qacct_cmd))
  File
"/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/sshutils/__init__.py",
line 538, in execute
    msg, command, exit_status, out_str)
RemoteCommandFailed: remote command 'source /etc/profile && qacct -j -b
201305071615' failed with status 1:
no jobs running since startup
/opt/sge6/default/common/accounting: No such file or directory
*** WARNING - Retrying in 60s

Just to tell you that I´m running MPICH2. This is part of my config file:

[cluster NewaveUbuntuHVM]
KEYNAME = MasterNode
CLUSTER_SIZE = 5
CLUSTER_USER = sgeadmin
CLUSTER_SHELL = bash
MASTER_IMAGE_ID = ami-7f1d8a16
NODE_IMAGE_ID = ami-411d8a28
NODE_INSTANCE_TYPE = cr1.8xlarge
PLUGINS = mpich2
VOLUMES = newave

All the best,

Sergio


On Mon, May 6, 2013 at 10:42 AM, Sergio Mafra <sergiohmafra_at_gmail.com>wrote:

> Hi Rajat,
>
> Thanks so much for your help. I´ll do as you said and report the results
> here.
>
> All the best,
>
> Sergio
>
>
> On Sun, May 5, 2013 at 2:48 PM, Rajat Banerjee <rajatb_at_post.harvard.edu>wrote:
>
>> Hi Sergio,
>> Sorry for the delayed response. Busy week at work. Adding starcluster
>> alias back, in case this helps other people in the future.
>>
>> Like I said, I'm not sure why your instance is coming up with PDT in the
>> EC2 instance, since from what I remember it would always return UTC.
>>
>> Is it possible for you to download the latest dev version if you haven't
>> tried that already?
>>
>> http://star.mit.edu/cluster/docs/latest/contribute.html
>>
>> Slightly different directions than the one you specified. Then, you can
>> modify this file:
>> starcluster/balancers/sge/__init__.py line 466
>> To replace UTC with PDT. Then run the ELB, and it'll run the latest code.
>> Let me know if that works, and we can file a bug and go through the formal
>> process of letting you switch the timezone. I'm guessing that you're pretty
>> familiar with python programming, but if you have more problems then feel
>> free to ask more questions.
>>
>> Best,
>> Rajat
>>
>>
>> On Tue, Apr 30, 2013 at 2:07 PM, Sergio Mafra <sergiohmafra_at_gmail.com>wrote:
>>
>>> Hi Rajat,
>>>
>>> Thanks so much for your kindness in order to find out where this error
>>> was. Nice!
>>>
>>> It´s a little bit odd to understand what is causing that since I´m using
>>> the SC Controller as an instance in the same zone (us-east-1d) as the
>>> cluster launched by it. So this should be in the same time format...???
>>>
>>> What I did was donwload the code directly from the GIT´s site and
>>> compile it as described in
>>> http://star.mit.edu/cluster/docs/latest/installation.html
>>>
>>> ???
>>>
>>> All the best,
>>>
>>> Sergio
>>>
>>>
>>> On Tue, Apr 30, 2013 at 2:18 PM, Rajat Banerjee <rajatb_at_post.harvard.edu
>>> > wrote:
>>>
>>>> Sergio,
>>>> I looked at the code that is causing your problems. It's this line:
>>>>
>>>> return datetime.datetime.strptime(str, "%a %b %d %H:%M:%S UTC %Y")
>>>>
>>>> where 'str' is the output of the *remote* call to date. My mac returns
>>>> this:
>>>> rbanerjee:~/starcluster/StarCluster/starcluster/balancers/sge $ date
>>>> Tue Apr 30 13:12:44 EDT 2013
>>>>
>>>> Which looks like it would time format OK. One oddity is that your time
>>>> format is returning PDT when UTC is expected:
>>>>
>>>> ValueError: time data 'Tue Apr 30 07:01:35 PDT 2013' does not match
>>>> format '%a %b %d %H:%M:%S UTC %Y'
>>>>
>>>> Not sure what's causing the problems since it looks mostly right, but
>>>> feel free to tweak the time setting in the code in
>>>> starcluster/balancers/sge/__init__.py line 466 to make the pattern match
>>>> yours. Do you know why your time zone may be set differently than other AWS
>>>> instances we've used? Custom images?
>>>> Best,
>>>> Rajat
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Apr 30, 2013 at 12:43 PM, <starcluster-request_at_mit.edu> wrote:
>>>>
>>>>> Send StarCluster mailing list submissions to
>>>>> starcluster_at_mit.edu
>>>>>
>>>>> To subscribe or unsubscribe via the World Wide Web, visit
>>>>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>>>> or, via email, send a message with subject or body 'help' to
>>>>> starcluster-request_at_mit.edu
>>>>>
>>>>> You can reach the person managing the list at
>>>>> starcluster-owner_at_mit.edu
>>>>>
>>>>> When replying, please edit your Subject line so it is more specific
>>>>> than "Re: Contents of StarCluster digest..."
>>>>>
>>>>> Today's Topics:
>>>>>
>>>>> 1. Unable to mount ebs volume (Jerry Lee, GW/US)
>>>>> 2. LoadBalance (Sergio Mafra)
>>>>>
>>>>>
>>>>> ---------- Forwarded message ----------
>>>>> From: "Jerry Lee, GW/US" <Jerry.Lee_at_genewiz.com>
>>>>> To: <starcluster_at_mit.edu>
>>>>> Cc:
>>>>> Date: Mon, 15 Apr 2013 17:11:38 -0500
>>>>> Subject: [StarCluster] Unable to mount ebs volume
>>>>>
>>>>> Hi,****
>>>>>
>>>>> ** **
>>>>>
>>>>> I am a beginner of using the StarCluster. I created a ebs volme via
>>>>> Amazon AWS and configure it on the config file to use it for my cluster,
>>>>> but no matter what I do, it doesn't automatically mount the ebs volume onto
>>>>> the cluster. Please help.****
>>>>>
>>>>> ** **
>>>>>
>>>>> [cluster jerrycluster]****
>>>>>
>>>>> EXTENDS = smallcluster****
>>>>>
>>>>> VOLUMES = testdata****
>>>>>
>>>>> ** **
>>>>>
>>>>> [volume testdata]****
>>>>>
>>>>> VOLUME_ID=vol-a0fe24f9****
>>>>>
>>>>> MOUNT_PATH=/data****
>>>>>
>>>>> ** **
>>>>>
>>>>> >>> Waiting for cluster to come up... (updating every 30s)****
>>>>>
>>>>> >>> Waiting for instances to activate...****
>>>>>
>>>>> >>> Waiting for all nodes to be in a 'running' state...****
>>>>>
>>>>> 2/2
>>>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>>>> ****
>>>>>
>>>>> >>> Waiting for SSH to come up on all nodes...****
>>>>>
>>>>> 2/2
>>>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>>>> ****
>>>>>
>>>>> >>> Waiting for cluster to come up took 1.847 mins****
>>>>>
>>>>> >>> The master node is ec2-54-234-229-206.compute-1.amazonaws.com****
>>>>>
>>>>> >>> Setting up the cluster...****
>>>>>
>>>>> >>> Configuring hostnames...****
>>>>>
>>>>> 2/2
>>>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>>>> ****
>>>>>
>>>>> >>> Creating cluster user: None (uid: 1001, gid: 1001)****
>>>>>
>>>>> 2/2
>>>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>>>> ****
>>>>>
>>>>> >>> Configuring scratch space for user(s): sgeadmin****
>>>>>
>>>>> 2/2
>>>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>>>> ****
>>>>>
>>>>> >>> Configuring /etc/hosts on each node****
>>>>>
>>>>> 2/2
>>>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>>>> ****
>>>>>
>>>>> >>> Starting NFS server on master****
>>>>>
>>>>> >>> Configuring NFS exports path(s):****
>>>>>
>>>>> /home****
>>>>>
>>>>> >>> Mounting all NFS export path(s) on 1 worker node(s)****
>>>>>
>>>>> 1/1
>>>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>>>> ****
>>>>>
>>>>> >>> Setting up NFS took 0.073 mins****
>>>>>
>>>>> >>> Configuring passwordless ssh for root****
>>>>>
>>>>> >>> Configuring passwordless ssh for sgeadmin****
>>>>>
>>>>> >>> Shutting down threads...****
>>>>>
>>>>> 20/20
>>>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>>>> ****
>>>>>
>>>>> >>> Configuring SGE...****
>>>>>
>>>>> >>> Configuring NFS exports path(s):****
>>>>>
>>>>> /opt/sge6****
>>>>>
>>>>> >>> Mounting all NFS export path(s) on 1 worker node(s)****
>>>>>
>>>>> 1/1
>>>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>>>> ****
>>>>>
>>>>> >>> Setting up NFS took 0.020 mins****
>>>>>
>>>>> >>> Installing Sun Grid Engine...****
>>>>>
>>>>> 1/1
>>>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>>>> ****
>>>>>
>>>>> >>> Creating SGE parallel environment 'orte'****
>>>>>
>>>>> 2/2
>>>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>>>> ****
>>>>>
>>>>> >>> Adding parallel environment 'orte' to queue 'all.q'****
>>>>>
>>>>> >>> Shutting down threads...****
>>>>>
>>>>> 20/20
>>>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>>>> ****
>>>>>
>>>>> >>> Configuring cluster took 1.325 mins****
>>>>>
>>>>> >>> Starting cluster took 3.197 mins****
>>>>>
>>>>> ** **
>>>>>
>>>>> ** **
>>>>>
>>>>> Thanks,****
>>>>>
>>>>> ** **
>>>>>
>>>>> Jerry Lee****
>>>>>
>>>>> ** **
>>>>>
>>>>> Jerry Lee****
>>>>>
>>>>> Assistant Manager of Global Infrastructure****
>>>>>
>>>>> GENEWIZ Inc.****
>>>>>
>>>>> 40 Cragwood Road. Suite 201****
>>>>>
>>>>> South Plainfield, NJ 07080****
>>>>>
>>>>> Phone: 908-222-0711 ext. 3379****
>>>>>
>>>>> Fax: 908-333-4511 ****
>>>>>
>>>>> jerry.lee_at_genewiz.com****
>>>>>
>>>>> www.genewiz.com****
>>>>>
>>>>> ****
>>>>>
>>>>> This electronic message, including its attachments, is confidential
>>>>> and proprietary and is solely for the intended recipient. If you are not
>>>>> the intended recipient, this message was sent to you in error and you are
>>>>> hereby advised that any review, disclosure, copying, distribution or use of
>>>>> this message or any of the information included in this message by you is
>>>>> unauthorized and strictly prohibited. If you have received this message in
>>>>> error, please immediately notify the sender by reply to this message and
>>>>> permanently delete all copies of this message and its attachments in your
>>>>> possession. Thank you for your cooperation.****
>>>>>
>>>>> ** **
>>>>>
>>>>>
>>>>> ---------- Forwarded message ----------
>>>>> From: Sergio Mafra <sergiohmafra_at_gmail.com>
>>>>> To: "starcluster_at_mit.edu" <starcluster_at_mit.edu>
>>>>> Cc:
>>>>> Date: Tue, 30 Apr 2013 11:05:45 -0300
>>>>> Subject: [StarCluster] LoadBalance
>>>>> Hi fellows,
>>>>>
>>>>> I´m testing StarCluster version 0.999 and so far so good.
>>>>> one thing that isn´t working is loadbalance. This is what I get:
>>>>>
>>>>> ubuntu_at_domU-12-31-39-02-19-36:~$ starcluster loadbalance newcam
>>>>> StarCluster - (http://star.mit.edu/cluster) (v. 0.9999)
>>>>> Software Tools for Academics and Researchers (STAR)
>>>>> Please submit bug reports to starcluster_at_mit.edu
>>>>>
>>>>> >>> Starting load balancer (Use ctrl-c to exit)
>>>>> Maximum cluster size: 3
>>>>> Minimum cluster size: 1
>>>>> Cluster growth rate: 1 nodes/iteration
>>>>>
>>>>> *** WARNING - Failed to retrieve stats (1/5):
>>>>> Traceback (most recent call last):
>>>>> File
>>>>> "/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/balancers/sge/__init__.py",
>>>>> line 515, in get_stats
>>>>> self.stat = self._get_stats()
>>>>> File
>>>>> "/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/balancers/sge/__init__.py",
>>>>> line 487, in _get_stats
>>>>> now = self.get_remote_time()
>>>>> File
>>>>> "/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/balancers/sge/__init__.py",
>>>>> line 466, in get_remote_time
>>>>> return datetime.datetime.strptime(str, "%a %b %d %H:%M:%S UTC %Y")
>>>>> File "/usr/lib/python2.7/_strptime.py", line 325, in _strptime
>>>>> (data_string, format))
>>>>> ValueError: time data 'Tue Apr 30 07:01:35 PDT 2013' does not match
>>>>> format '%a %b %d %H:%M:%S UTC %Y'
>>>>> *** WARNING - Retrying in 60s
>>>>> ^CTraceback (most recent call last):
>>>>> File "/usr/local/bin/starcluster", line 9, in <module>
>>>>> load_entry_point('StarCluster==0.9999', 'console_scripts',
>>>>> 'starcluster')()
>>>>> File
>>>>> "/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/cli.py",
>>>>> line 313, in main
>>>>> StarClusterCLI().main()
>>>>> File
>>>>> "/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/cli.py",
>>>>> line 257, in main
>>>>> sc.execute(args)
>>>>> File
>>>>> "/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/commands/loadbalance.py",
>>>>> line 90, in execute
>>>>> lb.run(cluster)
>>>>> File
>>>>> "/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/balancers/sge/__init__.py",
>>>>> line 576, in run
>>>>> self.get_stats()
>>>>> File "<string>", line 2, in get_stats
>>>>> File
>>>>> "/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/utils.py",
>>>>> line 92, in wrap_f
>>>>> res = func(*arg, **kargs)
>>>>> File
>>>>> "/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/balancers/sge/__init__.py",
>>>>> line 521, in get_stats
>>>>> time.sleep(self.polling_interval)
>>>>>
>>>>> Any ideas?
>>>>>
>>>>> All Best,
>>>>>
>>>>> Sergio
>>>>>
>>>>> _______________________________________________
>>>>> StarCluster mailing list
>>>>> StarCluster_at_mit.edu
>>>>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> StarCluster mailing list
>>>> StarCluster_at_mit.edu
>>>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>>>
>>>>
>>>
>>
>
Received on Tue May 07 2013 - 15:30:39 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject