StarCluster - Mailing List Archive

Re: Jobs not writing output files

From: Ashish Jain <no email>
Date: Wed, 21 Nov 2012 17:04:42 -0800

Hi Rayson,

The exact command is this -

ssh -i key root_at_publicDns << EOD
qsub -N bt-mz.A.2 -b y -cwd -pe orte 2 mpirun
~/NPB3.3.1-MZ/NPB3.3-MZ-MPI/bin/bt-mz.A.2
EOD

1) I'm running the NASA Parallel Benchmark. It has classes A to F which
determine how large the benchmark is, and the number of MPI processes to
run on which is the last digit (1, 2, 4, 8...128). Out of the 43 such
benchmarks, 22 gave the correct result. For the remaining either the output
size is zero, half complete output or no output at all. If the run any of
these failed benchmarks individually, they run correctly.

2) I've found a few bugs, have got a few log files (around 22). What is the
best way to submit those?

Thanks
Ashish


On Wed, Nov 21, 2012 at 12:05 PM, Rayson Ho <raysonlogin_at_gmail.com> wrote:

> Hi Ashish,
>
> Can you list the qsub parameters you use to submit the jobs?
>
> Rayson
>
> ==================================================
> Open Grid Scheduler - The Official Open Source Grid Engine
> http://gridscheduler.sourceforge.net/
>
>
> On Tue, Nov 20, 2012 at 5:56 AM, Ashish Jain <ashishj_at_usc.edu> wrote:
> > Hi,
> >
> > I'm trying to submit many jobs at one go. I have 3 nodes each a EC2 1.4x
> > cluster. There are few glitches I have seen with this -
> >
> > 1) If I submit the job at one go ( around 6 jobs each needing one
> process),
> > apart from the first job, the rest of the jobs are put in a "t" state
> for a
> > long time
> > 2) If i space out the jobs ( sleep of 15 seconds between calls), the jobs
> > are run more smoothly. However I'm seeing an issue where the jobs are not
> > writing the .o and .e files, and sometimes when they write, they are
> either
> > incomplete or empty.
> >
> > I would like to understand what is happening here. Is there a minimum
> time
> > between submitting jobs?
> >
> > Thanks
> > Ashish
> >
> > _______________________________________________
> > StarCluster mailing list
> > StarCluster_at_mit.edu
> > http://mailman.mit.edu/mailman/listinfo/starcluster
> >
>
Received on Wed Nov 21 2012 - 20:04:44 EST
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject