Re: starcluster plugin status code 127

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

From: Justin Riley <no email>
Date: Wed, 21 Dec 2011 11:55:17 -0500

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Wei,

The problem is qconf is not in your default path when running execute.
This is because SGE is installed in /opt/sge6 and relies on
/etc/profile.d to setup the paths correctly. Unfortunately these configs
aren't automatically loaded when executing commands over SSH. For now
you can fix this using:

node.ssh.execute('source /etc/profile && qconf -mattr queue
load_thresholds np_load_avg=1.5 all.q')

In the upcoming version you can pass source_profile=True as a parameter
to execute which will do this for you.

With that said I'm working on making an SGE (now OGS) plugin which will
allow you to add custom SGE settings like the one you're applying
above. For those interested in contributing you can see the latest
progress in the 'sge-plugin' branch on github:

https://github.com/jtriley/StarCluster/blob/sge-plugin/starcluster/plugins/sge.py

This will allow you to do what your plugin is doing now:

[plugin sge]
setup_class = starcluster.plugins.sge.SGEPlugin
load_avg = 1.5
create_queues = myqueue, gpu
scheduling_interval = 5
....

If anyone's interested in contributing patches that implement such
options please fork the project on GitHUB, checkout the sge-plugin
branch, make the changes to SGEPlugin, and submit a pull request.

~Justin

On 12/21/11 3:33 AM, Wei Tao wrote:
> Hi Don,
>
> The plugin picked up the queue_to_config (all.q) as evidenced in the
error message:
>
> !!! ERROR - command 'qconf -mattr queue load_thresholds np_load_avg=1.5
*all.q*' failed with status 127
>
> My intention is to config the SGE at the cluster boot up time using the
plugin. Since I executed "starcluster runplugin" after the cluster
already booted up, it apparently is not an issue of plugin execution timing.
>
> The only reason I run the plugin or the plugin command after cluster
already booted up is for debugging purposes.
>
> It's just very strange to me that as root I can execute the exact same
command on the master node without any issue, but running as starcluster
plugin would fail.
>
> Also, what is status 127 anyway??
>
> Thanks!
>
> -Wei
>
>
> On Wed, Dec 21, 2011 at 1:42 AM, Don MacMillen <macd_at_nimbic.com
<mailto:macd_at_nimbic.com>> wrote:
>
> The only difference that I can see is that I have not used arguments to
> the plugin. I guess you did remember to set the argument "queue_to_config"
> in your config file?
>
> Another possible issue is if you are trying to reconfig a cluster that
is just
> in the process of coming up. If you try that command early on, it will
fail because
> sge has not been installed yet. Why do you want to config the cluster
afterwards
> rather than just on the initial bring up? HTH and let us know what you
find out.
> Regards.
>
> Don
>
>
> On Tue, Dec 20, 2011 at 10:02 PM, Wei Tao <wei.tao_at_tsibiocomputing.com
<mailto:wei.tao_at_tsibiocomputing.com>> wrote:
>
> Hi all,
>
> I tried to implement the queue configuration suggested by Don MacMillen
a while ago. Here is my plugin code:
>
> from starcluster.clustersetup import ClusterSetup
>
> class SgeConfig(ClusterSetup):
> def __init__(self, queue_to_config):
> self.queue_to_config = queue_to_config
>
> def run(self, nodes, master, user, user_shell, volumes):
> cmd_strg = 'qconf -mattr queue load_thresholds np_load_avg=1.5 %s'
%self.queue_to_config
> output = master.ssh.execute(cmd_strg)
>
> When I execute "starcluster runplugin <myplugin> <mycluster>", I got:
>
> >>> Running plugin <myplugin>
> !!! ERROR - command 'qconf -mattr queue load_thresholds np_load_avg=1.5
all.q' failed with status 127
>
> If I sshmaster and run the command directly as this:
>
> root_at_master:~# qconf -mattr queue load_thresholds np_load_avg=1.5 all.q
> root_at_master modified "all.q" in cluster queue list
>
> It works fine. Could someone please point out why the plugin would have
a status code 127 when direct execution of the command apparently works
fine?
>
> Thanks for the help!
>
>
> -Wei
> _______________________________________________
> StarCluster mailing list
> StarCluster_at_mit.edu <mailto:StarCluster_at_mit.edu>
> http://mailman.mit.edu/mailman/listinfo/starcluster
>
>
>
>
>
> --
> Wei Tao, Ph.D.
> TSI Biocomputing LLC
> 617-564-0934

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk7yD3QACgkQ4llAkMfDcrmaQQCcCrSNwpQt53aqTU96MiI9R839
3yYAn1P/CRJjQIvzWLfht3kd3a6mZI1M
=R7Fe
-----END PGP SIGNATURE-----
Received on Wed Dec 21 2011 - 11:55:22 EST

This message: [ Message body ]
Next message: Rajat Banerjee: "Large Cluster Suggestion"
Previous message: Paolo Di Tommaso: "Re: Starcluster debug file"
In reply to: Wei Tao: "Re: starcluster plugin status code 127"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Navigation

Re: starcluster plugin status code 127

Search:

Sort all by:

Navigation