StarCluster - Mailing List Archive

Re: Configure Nodes to submit jobs

From: Arman Eshaghi <no email>
Date: Wed, 1 Oct 2014 17:26:13 +0330

Please have a look at this http://linux.die.net/man/1/qconf or run
"man qconf" command

to check if the scripts are available to a given host you may run
command "df -h". The output will show you which paths are mounted from
an external host (your master node). If this is not the case maybe you
can move script to the shared folders.

All the best,
Arman


On Wed, Oct 1, 2014 at 5:10 PM, greg <margeemail_at_gmail.com> wrote:
> Thanks Chris! I'll try those debugging techniques.
>
> So running "qconf -as <nodename>" turns that node into a job submitter?
>
> -Greg
>
> On Wed, Oct 1, 2014 at 8:09 AM, Chris Dagdigian <dag_at_bioteam.net> wrote:
>>
>> 'EQW' is a combination of multiple message states (e)(q)(w). The
>> standard "qw" is familiar to everyone, the E indicates something bad at
>> the job level.
>>
>> There are multiple levels of debugging, starting with easy and getting
>> more cumbersome. Almost all require admin or sudo level access
>>
>> The 1st pass debug method is to run "qstat -j <jobID>" on the job that
>> is in EQW state, that should provide a bit more information about what
>> went wrong.
>>
>> After that you look at the .e and .o STDERR/STDOUT files from the script
>> if any were created
>>
>> After that you can use sudo privs to go into
>> $SGE_ROOT/$SGE_CELL/spool/qmaster/ and look at the messages file, there
>> are also per-node messages files you can look at as well.
>>
>> The next level of debugging after that usually involves setting the
>> sge_execd parameter KEEP_ACTIVE=true which triggers a behavior where SGE
>> will stop deleting the temporary files associated with a job life cycle.
>> Those files live down in the SGE spool at location
>> <executionhost>/active.jobs/<jobID/ -- and they are invaluable in
>> debugging nasty subtle job failures
>>
>> EQW should be easy to troubleshoot though - it indicates a fatal error
>> right at the beginning of the job dispatch or execution process. No
>> subtle things there
>>
>>
>> And if your other question was about nodes being allowed to submit jobs
>> -- yes you have to configure this. It can be done during SGE install
>> time or any time afterwards by doing "qconf -as <nodename>" from any
>> account with SGE admin privs. I have no idea if startcluster does this
>> automatically or not but I'd expect that it probably does, If not it's
>> an easy fix.
>>
>> -Chris
>>
>>
>> greg wrote:
>>> Hi guys,
>>>
>>> I'm afraid I'm still stuck on this. Besides my original question
>>> which I'm still not sure about. Does anyone have any general advice
>>> on debugging an EQW state? The same software runs fine in our local
>>> cluster.
>>>
>>> thanks again,
>>>
>>> Greg
> _______________________________________________
> StarCluster mailing list
> StarCluster_at_mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster
Received on Wed Oct 01 2014 - 09:56:15 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject