Re: jobs on slave nodes disappear
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi Liang,
Is this happening consistently even after restarting the cluster using
"starcluster restart mycluster"? Also, is there anything in your
job(s) error logs? Given the output you provided these would most
likely be located in the directory you submitted the job from and
should be named something like "single.sh.e23".
~Justin
On 12/30/2011 08:58 PM, liang cheng wrote:
> Greetings !
>
> I created a star cluster on EC2 and use qsub to submit jobs. It
> used to work well. From this afternoon, after I requested for
> additional EC2 instance from Amazon, the issue comes out.
>
> Only the jobs submitted to the master node are executed. Other
> jobs disappeared just in no time. Some diagonosis is as below. Any
> helps are appreciated !
>
> Happy New Year !
>
>
> root_at_master:/# qacct -j 23
> ==============================================================
> qname all.q hostname node006 group root
> owner root project NONE department defaultdepartment
> jobname single.sh out 3 jobnumber 23 taskid
> undefined account sge priority 0 qsub_time Sat Dec 31
> 01:38:32 2011 start_time Sat Dec 31 01:38:39 2011 end_time
> Sat Dec 31 01:38:39 2011 granted_pe NONE slots 1
> failed 0 exit_status 0 ru_wallclock 0 ru_utime 0.010
> ru_stime 0.010 ru_maxrss 2276 ru_ixrss 0
> ru_ismrss 0 ru_idrss 0 ru_isrss 0 ru_minflt 2648
> ru_majflt 0 ru_nswap 0 ru_inblock 0 ru_oublock 272
> ru_msgsnd 0 ru_msgrcv 0 ru_nsignals 0 ru_nvcsw 12
> ru_nivcsw 3 cpu 0.020 mem 0.000 io
> 0.000 iow 0.000 maxvmem 0.000 arid undefined
>
> =========================
>
> Thanks, -Liang
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)
Comment: Using GnuPG with Mozilla -
http://enigmail.mozdev.org/
iEYEARECAAYFAk7/aooACgkQ4llAkMfDcrmFegCfULuLAaDIrEvDi1257HZR3ico
B5wAn2rGWD5D9c4rETIq07d6jKq/jrCs
=pb1b
-----END PGP SIGNATURE-----
Received on Sat Dec 31 2011 - 15:03:24 EST
This archive was generated by
hypermail 2.3.0.