StarCluster - Mailing List Archive

Re: jobs on slave nodes disappear

From: liang cheng <no email>
Date: Sat, 31 Dec 2011 12:18:33 -0800

Hi Justin,

Thanks for your reply. There's no error log nor output log even when I use
"-e" or "-o" option.

I created a cluster with one master and 10 slave. I made a minor change on
the master node and use "starcluster createimage i-xxxx AAA BBB". "i-xxxx"
is the instance id of the master. After I got the ami-yyyy, I run
"starcluster start ami-yyyy". I found all jobs submitted to slave nodes are
finished instantly, as you see in the log I sent earlier. The jobs in
master node are run normally.

I haven't used "restart" command but will give it a try.

-Liang

On Sat, Dec 31, 2011 at 12:03 PM, Justin Riley <jtriley_at_mit.edu> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi Liang,
>
> Is this happening consistently even after restarting the cluster using
> "starcluster restart mycluster"? Also, is there anything in your
> job(s) error logs? Given the output you provided these would most
> likely be located in the directory you submitted the job from and
> should be named something like "single.sh.e23".
>
> ~Justin
>
>
> On 12/30/2011 08:58 PM, liang cheng wrote:
> > Greetings !
> >
> > I created a star cluster on EC2 and use qsub to submit jobs. It
> > used to work well. From this afternoon, after I requested for
> > additional EC2 instance from Amazon, the issue comes out.
> >
> > Only the jobs submitted to the master node are executed. Other
> > jobs disappeared just in no time. Some diagonosis is as below. Any
> > helps are appreciated !
> >
> > Happy New Year !
> >
> >
> > root_at_master:/# qacct -j 23
> > ==============================================================
> > qname all.q hostname node006 group root
> > owner root project NONE department defaultdepartment
> > jobname single.sh out 3 jobnumber 23 taskid
> > undefined account sge priority 0 qsub_time Sat Dec 31
> > 01:38:32 2011 start_time Sat Dec 31 01:38:39 2011 end_time
> > Sat Dec 31 01:38:39 2011 granted_pe NONE slots 1
> > failed 0 exit_status 0 ru_wallclock 0 ru_utime 0.010
> > ru_stime 0.010 ru_maxrss 2276 ru_ixrss 0
> > ru_ismrss 0 ru_idrss 0 ru_isrss 0 ru_minflt 2648
> > ru_majflt 0 ru_nswap 0 ru_inblock 0 ru_oublock 272
> > ru_msgsnd 0 ru_msgrcv 0 ru_nsignals 0 ru_nvcsw 12
> > ru_nivcsw 3 cpu 0.020 mem 0.000 io
> > 0.000 iow 0.000 maxvmem 0.000 arid undefined
> >
> > =========================
> >
> > Thanks, -Liang
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.17 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAk7/aooACgkQ4llAkMfDcrmFegCfULuLAaDIrEvDi1257HZR3ico
> B5wAn2rGWD5D9c4rETIq07d6jKq/jrCs
> =pb1b
> -----END PGP SIGNATURE-----
>
Received on Sat Dec 31 2011 - 15:18:34 EST
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject