StarCluster - Mailing List Archive

jobs on slave nodes disappear

From: liang cheng <no email>
Date: Fri, 30 Dec 2011 17:58:20 -0800

Greetings !

I created a star cluster on EC2 and use qsub to submit jobs. It used to
work well. From this afternoon, after I requested for additional EC2
instance from Amazon, the issue comes out.

Only the jobs submitted to the master node are executed. Other jobs
disappeared just in no time. Some diagonosis is as below. Any helps are
appreciated !

Happy New Year !


root_at_master:/# qacct -j 23
==============================================================
qname all.q
hostname node006
group root
owner root
project NONE
department defaultdepartment
jobname single.sh out 3
jobnumber 23
taskid undefined
account sge
priority 0
qsub_time Sat Dec 31 01:38:32 2011
start_time Sat Dec 31 01:38:39 2011
end_time Sat Dec 31 01:38:39 2011
granted_pe NONE
slots 1
failed 0
exit_status 0
ru_wallclock 0
ru_utime 0.010
ru_stime 0.010
ru_maxrss 2276
ru_ixrss 0
ru_ismrss 0
ru_idrss 0
ru_isrss 0
ru_minflt 2648
ru_majflt 0
ru_nswap 0
ru_inblock 0
ru_oublock 272
ru_msgsnd 0
ru_msgrcv 0
ru_nsignals 0
ru_nvcsw 12
ru_nivcsw 3
cpu 0.020
mem 0.000
io 0.000
iow 0.000
maxvmem 0.000
arid undefined

=========================

Thanks,
-Liang
Received on Fri Dec 30 2011 - 20:58:21 EST
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject