StarCluster - Mailing List Archive

Re: Fwd: Integration of MPICH2 plugin with SGE

From: Hyokun Yun <no email>
Date: Mon, 19 Aug 2013 11:10:42 -0700

Sergio,


Thanks for the advice!

I have read the document, but why would daemons reject the task if it is
configured $fill_up?
Shouldn't OGE work for both choices? The document doesn't say I should not
use $fill_up.

I think I gave $round_robin a try, but I will try once again and let you
know whether I had success.

Also, is it possible that this is a problem specific to the AMI I am using?


Best,
Hyokun Yun



On Mon, Aug 19, 2013 at 10:38 AM, Sergio Mafra <sergiohmafra_at_gmail.com>wrote:

> Hi Hyokun,
>
> Im a user of MPICH2 and OGE.
>
> It seems that youre using $fill_up instead of $round_robin. If so, try to
> change it to $round_robin with $ qconf -mp orte
> You can learn more here:
> http://star.mit.edu/cluster/docs/latest/plugins/sge.html#using-the-plugin
>
> Let me know if this help you.
>
> All best.
>
> Sergio
>
>
> On Mon, Aug 19, 2013 at 1:53 AM, Hyokun Yun <yun3_at_purdue.edu> wrote:
>
>> Dear starcluster users,
>>
>>
>> I am experiencing a problem using MPICH2 plugin with SGE.
>>
>> I am using the following image: ami-52a0c53b which uses Ubuntu 12.04
>>
>> When I use mpich2 plugin, it seems like mpich2 and SGE are not tightly
>> integrated: when I execute my script using qsub, I get the following error
>> message.
>>
>> error: executing task of job 1 failed: execution daemon on host "node001"
>> didn't accept task
>> error: executing task of job 1 failed: execution daemon on host "node002"
>> didn't accept task
>> error: executing task of job 1 failed: execution daemon on host "node003"
>> didn't accept task
>> error: executing task of job 1 failed: execution daemon on host
>> "nodef004" didn't accept task
>>
>> It runs fine when I simply execute 'mpirun' myself, instead of relying on
>> SGE.
>> Also, the same script runs fine as well when I use OpenMPI instead of
>> MPICH2. That's why I suspect it is MPICH2 & SGE integration issue.
>>
>> The problem is that I need multi-thread support, and it is by default
>> disabled in OpenMPI. I also prefer to use MPICH2 instead of OpenMPI.
>>
>> I was able to reproduce the problem when I restarted the cluster from
>> scratch. Would any of you please take a look on the problem by trying the
>> same image with MPICH2 plugin?
>>
>>
>> Thanks,
>> Hyokun Yun
>>
>> _______________________________________________
>> StarCluster mailing list
>> StarCluster_at_mit.edu
>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>
>>
>
>
> _______________________________________________
> StarCluster mailing list
> StarCluster_at_mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster
>
>


-- 
*Hyokun Yun *( http://www.stat.purdue.edu/~yun3 )
Ph.D Candidate
Department of Statistics
Purdue University
Received on Mon Aug 19 2013 - 14:10:44 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject