Many jobs stuck in "t state"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

From: Ying Sonia Ting <no email>
Date: Tue, 24 Feb 2015 12:08:44 -0800

Hi all,

This might be more of a SGE issue than Starcluster issue but I'd really
appreciate any comments.

I have a bunch of jobs running on AWS spot instances using
starcluster. *Most of
them would stuck in "t state" for hours and then finally execute (in the r
state). *For instance, 50% of the jobs now that are not in qw are in "t
state".

The same program/script/AMI have been used frequently and this is the worse
ever. The only difference is the jobs this time are processing bigger files
(~6G each, 90 of them) located on a NFS shared gp2 volume. Jobs were
divided into tasks to ensure that only 4-5 jobs are processing the same
file at once. The memory were not even close to be overloaded (only used 5G
out of 240G each node). The long stuck in "t state" is wasting money and
CPU hours.

Have any of you seen this issue before? Is there anyway I can fix / work
around this issue?

Thanks a lot,
Sonia

-- 
Ying S. Ting
Ph.D. Candidate, MacCoss Lab
Department of Genome Sciences, University of Washington

Received on Tue Feb 24 2015 - 15:09:44 EST

This message: [ Message body ]
Next message: Avner May: "Error creating/deleting security groups"
Previous message: Vas Kum: "Re: StarCluster Digest, Vol 66, Issue 7"
Next in thread: Jennifer Staab: "Re: Many jobs stuck in "t state""
Reply: Jennifer Staab: "Re: Many jobs stuck in "t state""

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

This archive was generated by hypermail 2.3.0.

Many jobs stuck in "t state"

Search:

Sort all by:

Navigation

Many jobs stuck in "t state"

Search:

Sort all by:

Navigation