When I try to submit like Chris suggested I get the following:
$ qsub -pe threaded=4 job.sh
qsub: Numerical value invalid!
The initial portion of string "./job.sh" contains no decimal number
When I run my app interactively outside of sge and look at htop it only uses one core :(
In my original message I tried to explain that I launched the starcluster ami on a single ec2 instance, so I'm not working with a cluster. But I'd still like to take advantage of all the cores.
-----Original Message-----
From: Rayson Ho [mailto:raysonlogin_at_yahoo.com]
Sent: Wednesday, August 31, 2011 11:09 AM
To: starcluster_at_mit.edu; Bill Lennon
Subject: RE: [StarCluster] Starcluster - Taking advantage of multiple cores on EC2
--- On Wed, 8/31/11, Bill Lennon <blennon_at_shopzilla.com> wrote:
> 1) What do you get when you run "qhost" on the EC2 cluster??
>
> error: commlib error: got select error (Connection
> refused)
> error: unable to send message to qmaster using port 6444 on host
> "localhost": got send error
Looks like you are not able to connect to the SGE qmaster... did you actually submit jobs to SGE??
> 2) If you run your application outside of SGE on your EC2 cluster, do
> you get the same behavior??
>
> If I 'python job.py' I don't see those errors...if that's what your
> asking?
I mean, on one of your EC2 nodes, run your application interactively. Then run "top" or "uptime" and see if outside of SGE, your application is able to use all the cores on the node.
Rayson
=================================
Grid Engine / Open Grid Scheduler
http://gridscheduler.sourceforge.net
>
> 3) Intel MKL uses OpenMP internally, did you set the env.
> var. OMP_NUM_THREADS on the laptop??
>
> Nope.
>
> Hope that may give you a lead. I'm unfortunately a noob.
>
> -----Original Message-----
> From: Rayson Ho [mailto:raysonlogin_at_yahoo.com]
>
> Sent: Wednesday, August 31, 2011 10:57 AM
> To: Bill Lennon; starcluster_at_mit.edu
> Subject: Re: [StarCluster] Starcluster - Taking advantage of multiple
> cores on EC2
>
> Bill,
>
> 1) What do you get when you run "qhost" on the EC2 cluster??
>
> 2) If you run your application outside of SGE on your EC2 cluster, do
> you get the same behavior??
>
> 3) Intel MKL uses OpenMP internally, did you set the env.
> var. OMP_NUM_THREADS on the laptop??
>
> Rayson
>
> =================================
> Grid Engine / Open Grid Scheduler
> http://gridscheduler.sourceforge.net
>
>
>
> --- On Wed, 8/31/11, Chris Dagdigian <dag_at_bioteam.net>
> wrote:
> > Grid Engine just executes jobs and manages resources.
> >
> > It's up to your code to use more than one core.
> >
> > Maybe there is a config difference between your local
> scipy/numpy etc.
> > install and how StarCluster deploys it's version?
> >
> > Grid Engine assumes by default a 1:1 ratio between
> job and CPU core
> > unless you are explicitly submitting to a parallel
> environment.
> >
> > If you are the only user on a small cluster you
> probably don't have to
> > do much, the worst that could happen would be that SGE
> queues up and
> > runs more than one of your threaded app job on the
> same host and they
> > end up competing for CPU/memory resources to the
> detriment of all.
> >
> > One way around that would be to configure exclusive
> job access and
> > submit your job with the "exclusive" request. That
> will ensure that
> > your job when it runs will get an entire execution
> host.
> >
> > Another way is to fake up a parallel environment. For
> your situation
> > it is very common for people to build a parallel
> environment called
> > "Threaded" or "SMP" so that they can run threaded apps
> without
> > oversubscribing an execution host.
> >
> > With a threaded PE set up you'd submit your job:
> >
> > $ qsub -pe threaded=<# CPU> my-job-script.sh
> >
> > ... and SGE would account for your single job using
> more than one CPU
> > on a single host.
> >
> >
> > FYI Grid Engine has recently picked up some Linux core
> binding
> > enhancements that make it easier to pin jobs and tasks
> to specific
> > cores. I'm not sure if the version of GE that is built
> into
> > StarCluster today has those features yet but it should
> gain them
> > eventually.
> >
> > Regards,
> > Chris
> >
> >
> >
> >
> >
> >
> >
> > Bill Lennon wrote:
> > > Dear Starcluster Gurus,
> > >
> > > I’ve successfully loaded the Starcluster AMI
> onto a
> > single high-memory
> > > quadruple extra large instance and am performing
> an
> > SVD on a large
> > > sparse matrix and then performing k-means on the
> > result. However, I’m
> > > only taking advantage of one core when I do
> > this? On my laptop (using
> > > scipy numpy, intel MKL), on a small version of
> this,
> > all cores are taken
> > > advantage of automagically. Is there an easy
> way
> > to do this with a
> > > single starcluster instance with Atlas? Or do I
> need
> > to explicitly write
> > > my code to multithread?
> > >
> > > My thanks,
> > >
> > > Bill
> > >
> > _______________________________________________
> > StarCluster mailing list
> > StarCluster_at_mit.edu
> > http://mailman.mit.edu/mailman/listinfo/starcluster
> >
>
>
Received on Wed Aug 31 2011 - 14:18:43 EDT