StarCluster - Mailing List Archive

Re: Easy way to delete more than 100k jobs

From: Jacob Barhak <no email>
Date: Mon, 23 Feb 2015 15:33:59 -0600

Thanks Lyn, Thanks Rayson,

For those who may be reading this in the future looking for a solution,
here is a partial solution.

It does not reduce the time for deleting many jobs, yet it prevents the
system from crushing multiple times in the attempt to delete the jobs.

Here is what I did:
while sleep 600; do timeout 480 qdel -u UserName ; done

Just replace 600 with a safe period for the system to recover and 480 with
approxinate time that the system runs before memory being exhausted, and
replace UserName with your user. Those numbers will change from system to
system.

This will delete a chunk at a time without crushing the system. I am still
waiting after about 9 hours, yet I did not need to restart the server due
to SGE crushing.

I hope this solution will help others.

         Jacob
On Feb 23, 2015 2:03 AM, "Rayson Ho" <raysonlogin_at_gmail.com> wrote:

> Is your local cluster using classic or BerkeleyDB spooling? If it is
> classic over NFS, then qdel can be very slow.
>
> One quick workaround is to hide the job spooling files manually, just move
> the spooled jobs from $SGE_ROOT/$SGE_CELL/spool/qmaster/jobs to a private
> backup directory.
>
> Rayson
>
> ==================================================
> Open Grid Scheduler - The Official Open Source Grid Engine
> http://gridscheduler.sourceforge.net/
> http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html
>
>
>
> On Sun, Feb 22, 2015 at 8:31 PM, Jacob Barhak <jacob.barhak_at_gmail.com>
> wrote:
>
>> Hi to SGE experts,
>>
>> This is an SGE question rather than StarCluster related. I am actually
>> having this issue on a local clyster. And I did raise thulis issue a while
>> ago. So sorry for repetition. And if you know of another list that can
>> help, please direct me there.
>>
>> The qdel command does not respond well with a large number of jobs. More
>> than 100k jobs makes things intollerable.
>>
>> It takes a long time and consumes too much memory if trying to delete all
>> jobs.
>>
>> Is there a shortcut someone is aware of to clear the enite queue without
>> waiting for many hours or the server running out of memory?
>>
>> Will removing the StarCluser server and reinstalling it work? If so,
>> how to bypass long configuration? Are there several files that can do the
>> trick if handled properly?
>>
>> I hope someone has a quick solution.
>>
>> Jacob
>>
>> _______________________________________________
>> StarCluster mailing list
>> StarCluster_at_mit.edu
>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>
>>
>
Received on Mon Feb 23 2015 - 16:34:01 EST
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject