Re: CG1 plus StarCluster Questions

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

From: Rayson Ho <no email>
Date: Sat, 12 May 2012 00:00:55 -0400

Yes, you could take a snapshot and create a new AMI - the GPU
consumable resource added will be saved in the master host's data
store (in Grid Engine terms: qmaster spool).

I always boot StarCluster from the fresh AMIs, so I am not very
familiar with this. I *think* it can be a problem if you have a
configured Grid Engine installation in the AMI:

- if you let StarCluster run the setup phrase again, then it will
overwrite the previous configuration.

- if you don't let StarCluster rerun the setup, then the hostname
mapping might not be valid anymore... ie. the new instance wouldn't
get the same IP, and thus mappings in /etc/hosts need to be redone...
and Grid Engine execution hosts need to be re-added as well.

I will check with Justin and see how StarCluster can be improved to
enhance the GPU setup process.

Rayson

================================
Open Grid Scheduler / Grid Engine
http://gridscheduler.sourceforge.net/

Scalable Grid Engine Support Program
http://www.scalablelogic.com/

On Fri, May 11, 2012 at 11:43 PM, Scott Le Grand <varelse2005_at_gmail.com> wrote:
> 15 minutes here, but I've added it. Can I now create an AMI from the master
> instance so I don't need to do this every time I spin up a cluster?
>
> Scott
>
>
>
> On Fri, May 11, 2012 at 8:22 PM, Rayson Ho <raysonlogin_at_gmail.com> wrote:
>>
>> That's a known issue - and we would like to understand why it is taking so
>> long.
>>
>> If you leave it there for around 3-5 mins, then qmon will show up. For
>> a LAN connection it is not painful, but for a long latency network,
>> then starting qmon takes forever :-(
>>
>> Rayson
>>
>> ================================
>> Open Grid Scheduler / Grid Engine
>> http://gridscheduler.sourceforge.net/
>>
>> Scalable Grid Engine Support Program
>> http://www.scalablelogic.com/
>>
>>
>> On Fri, May 11, 2012 at 11:18 PM, Scott Le Grand <varelse2005_at_gmail.com>
>> wrote:
>> > StarCluster - (http://web.mit.edu/starcluster) (v. 0.93.3)
>> >
>> > If I starcluster sshmaster -X mycluster and type qmon, then the splash
>> > screen for it shows up but it doesn't seem to progress from there. How
>> > long
>> > should it take to get past that?
>> >
>> > Scott
>> >
>> >
>> >
>> > On Fri, May 11, 2012 at 8:15 PM, Rayson Ho <raysonlogin_at_gmail.com>
>> > wrote:
>> >>
>> >> If you have a recent enough version of StarCluster, then you should be
>> >> able to run qmon without any special settings that forward X in SSH.
>> >>
>> >> This was added in: https://github.com/jtriley/StarCluster/issues/81
>> >>
>> >> Rayson
>> >>
>> >> ================================
>> >> Open Grid Scheduler / Grid Engine
>> >> http://gridscheduler.sourceforge.net/
>> >>
>> >> Scalable Grid Engine Support Program
>> >> http://www.scalablelogic.com/
>> >>
>> >>
>> >>
>> >> On Fri, May 11, 2012 at 10:58 PM, Scott Le Grand
>> >> <varelse2005_at_gmail.com>
>> >> wrote:
>> >> > This is a stupid question but...
>> >> >
>> >> > Given I access a starcluster cluster indirectly, how do I run an X
>> >> > application such that it displays on my remote system?
>> >> >
>> >> > I would normally type ssh -X ec2-user_at_amazoninstance.com qmon in
>> >> > order
>> >> > to
>> >> > fire up qmon, yes?
>> >> >
>> >> > How do I do the equivalent here?
>> >> >
>> >> > On Fri, May 11, 2012 at 2:45 PM, Rayson Ho <raysonlogin_at_yahoo.com>
>> >> > wrote:
>> >> >>
>> >> >> Hi Scott,
>> >> >>
>> >> >> You can set up a consumable resource to track usage of GPUs:
>> >> >>
>> >> >> http://gridscheduler.sourceforge.net/howto/consumable.html
>> >> >>
>> >> >> And we also have a load sensor that monitors the GPU devices:
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> https://gridscheduler.svn.sourceforge.net/svnroot/gridscheduler/trunk/source/dist/gpu/gpu_sensor.c
>> >> >>
>> >> >> If you want to use the (2nd - ie. dynamic) method, then you will
>> >> >> need
>> >> >> to
>> >> >> set it up by following this HOWTO:
>> >> >>
>> >> >> http://gridscheduler.sourceforge.net/howto/loadsensor.html
>> >> >>
>> >> >> The first method of using a consumable resource works best if you
>> >> >> don't
>> >> >> run GPU
>> >> >> programs outside of Open Grid Scheduler/Grid Engine.
>> >> >>
>> >> >> Also note that in the next release of StarCluster GPU support will
>> >> >> be
>> >> >> enhanced.
>> >> >>
>> >> >> Rayson
>> >> >>
>> >> >> =================================
>> >> >> Open Grid Scheduler / Grid Engine
>> >> >> http://gridscheduler.sourceforge.net/
>> >> >>
>> >> >> Scalable Grid Engine Support Program
>> >> >> http://www.scalablelogic.com/
>> >> >>
>> >> >>
>> >> >> ________________________________
>> >> >> From: Scott Le Grand <varelse2005_at_gmail.com>
>> >> >> To: starcluster_at_mit.edu
>> >> >> Sent: Friday, May 11, 2012 5:25 PM
>> >> >> Subject: [StarCluster] CG1 plus StarCluster Questions
>> >> >>
>> >> >> Hey guys, I'm really impressed with StarCluster and I've used it to
>> >> >> create
>> >> >> clusters ranging from 2 to 70 instances...
>> >> >>
>> >> >> I've also customized it to use CUDA 4.2 and 295.41, the latest
>> >> >> toolkit
>> >> >> and
>> >> >> driver, because my code has GTX 680 support and I don't want to have
>> >> >> to
>> >> >> comment it out just to build it (and 4.1 had a horrendous perf
>> >> >> regression).
>> >> >>
>> >> >> Anyway, 2 questions, one of which I think you already answered:
>> >> >>
>> >> >> 1. I'd like to setup a custom AMI that by default has configured 2
>> >> >> GPUs
>> >> >> as
>> >> >> a consumable resource. I already have code to utilize exclusive
>> >> >> mode
>> >> >> and
>> >> >> choose whichever GPU isn't in use in my app, but that all falls down
>> >> >> because
>> >> >> the queueing system is based on CPU cores rather than GPU count.
>> >> >> How
>> >> >> would
>> >> >> I set this up once so I can save the customized AMI and never have
>> >> >> to
>> >> >> do it
>> >> >> again?
>> >> >>
>> >> >> 2. I'm also seeing the .ssh directories disappear on restart. But
>> >> >> I'll
>> >> >> look at your solution as I've just been restarting the whole cluster
>> >> >> up
>> >> >> to
>> >> >> now.
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> _______________________________________________
>> >> >> StarCluster mailing list
>> >> >> StarCluster_at_mit.edu
>> >> >> http://mailman.mit.edu/mailman/listinfo/starcluster
>> >> >>
>> >> >>
>> >> >
>> >> >
>> >> > _______________________________________________
>> >> > StarCluster mailing list
>> >> > StarCluster_at_mit.edu
>> >> > http://mailman.mit.edu/mailman/listinfo/starcluster
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> ==================================================
>> >> Open Grid Scheduler - The Official Open Source Grid Engine
>> >> http://gridscheduler.sourceforge.net/
>> >
>> >
>>
>>
>>
>> --
>> ==================================================
>> Open Grid Scheduler - The Official Open Source Grid Engine
>> http://gridscheduler.sourceforge.net/
>
>

-- 
==================================================
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/

Received on Sat May 12 2012 - 00:00:57 EDT

This message: [ Message body ]
Next message: Sergio Mafra: "StarCluster on CentOS"
Previous message: Scott Le Grand: "Re: CG1 plus StarCluster Questions"
In reply to: Scott Le Grand: "Re: CG1 plus StarCluster Questions"
Next in thread: Justin Riley: "Re: CG1 plus StarCluster Questions"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Navigation

Re: CG1 plus StarCluster Questions

Search:

Sort all by:

Navigation