StarCluster - Mailing List Archive

Re: load balanced nodes accepting jobs before ready

From: Stewart, Andrew <no email>
Date: Thu, 24 Apr 2014 16:11:31 +0000

Awesome, Iıll try that. The thought occurred to me but I wasnıt sure if
the SGE plugin was a special case that had to be run outside of the
context of the optional plugin list.

Thanks!
Andrew


--
Andrew Stewart
Office of Research Information Services (ORIS),
Office of the Chief Information Officer (OCIO),
Smithsonian Institution
202-505-3633
On 4/24/14, 12:09 PM, "Justin Riley" <jtriley_at_MIT.EDU> wrote:
>Hey Stewart,
>
>You can fix this issue by setting disable_queue=True in your config to
>disable the default SGE plugin. Then you can define the SGE plugin in
>your config, add it to your plugins list, and then move the pkginstaller
>(and any other plugins that need to run before the node gets added)
>*before* SGE in the list. This will ensure all other plugins get
>executed before the node gets added to SGE. See the following doc for
>more details on setting disable_queue and defining the SGE plugin in
>your config:
>
>https://urldefense.proofpoint.com/v1/url?u=http://star.mit.edu/cluster/doc
>s/latest/plugins/sge.html%23advanced-options&k=diZKtJPqj4jWksRIF4bjkw%3D%3
>D%0A&r=BtonOWSFhbuSfSXh3meGJQ%3D%3D%0A&m=PzgbcH1%2FFGh9TgdCo76DwKgrQmH7Q5a
>RkkX1TxHpijY%3D%0A&s=84887f5a8d8945ca8812b322628c8e876b1f0e62ba1d62f254965
>9792642a128
>
>~Justin
>
>On Mon, Apr 14, 2014 at 06:02:20PM +0000, Stewart, Andrew wrote:
>>    pkginstaller was called during add_node, but the node was added to
>>the
>>    host list and its queue enabled before pkginstaller had a chance to
>>finish
>>    installing dependencies.  So it looks like a race condition.  I did
>>bump
>>    pkginstaller to the front of the plugins line (ahead of IPCluster)
>>but I
>>    havenıt yet bothered to test whether that helps the situation any.
>> The
>>    most certain way to handle it would be to just disable the queue
>>until
>>    provisioning is complete.
>>    I actually think the simpler solution would be to bypass
>>pkginstaller and
>>    just share managed packages with compute nodes via NFS.  Why
>>reinstall the
>>    same package N times?
>>    --
>>    Andrew Stewart
>>    Office of Research Information Services (ORIS),
>>    Office of the Chief Information Officer (OCIO),
>>    Smithsonian Institution
>>    202-505-3633
>>    From: Rajat Banerjee <[1]rajatb_at_post.harvard.edu>
>>    Date: Monday, April 14, 2014 at 10:49 AM
>>    To: Andrew Stewart <[2]stewarta_at_si.edu>
>>    Cc: "[3]starcluster_at_mit.edu" <[4]starcluster_at_mit.edu>
>>    Subject: Re: [StarCluster] load balanced nodes accepting jobs before
>>ready
>>    Hi,
>>    Does that mean that the pkginstaller plugin doesn't get called during
>>    add_node ? before the host is added to the SGE host list?
>>    Raj
>> 
>> References
>> 
>>    Visible links
>>    1. mailto:rajatb_at_post.harvard.edu
>>    2. mailto:stewarta_at_si.edu
>>    3. mailto:starcluster_at_mit.edu
>>    4. mailto:starcluster_at_mit.edu
>
>> _______________________________________________
>> StarCluster mailing list
>> StarCluster_at_mit.edu
>> 
>>https://urldefense.proofpoint.com/v1/url?u=http://mailman.mit.edu/mailman
>>/listinfo/starcluster&k=diZKtJPqj4jWksRIF4bjkw%3D%3D%0A&r=BtonOWSFhbuSfSX
>>h3meGJQ%3D%3D%0A&m=PzgbcH1%2FFGh9TgdCo76DwKgrQmH7Q5aRkkX1TxHpijY%3D%0A&s=
>>980d09ee86824895ced819d1fc5866f8968e4da6414431fccbddf7953632fb18
>
Received on Thu Apr 24 2014 - 12:11:34 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject