StarCluster - Mailing List Archive

Re: DRMAA jobs failing when load balancer enabled and jobs longer than 60 mins (Lilley, John F.)

From: François-Michel L'Heureux <no email>
Date: Thu, 6 Mar 2014 14:13:48 -0500

Hi John

I assume DRMAA is a replacement to OGS/SGE?

About DRMAA bailing out, I don't know the product, but your guess is likely
correct: I might crash when nodes go away. There is a somewhat similar
issue with OGS where we need to clean it when nodes go away. It doesn't
crash though.

For your second issue, regarding execution host, again, I had a similar
issue with OGS. The trick I used is that I left the master node as an
execution host, but I defined its number of slots to 0. Hence, OGS is happe
because there is at least an exec host and the load balancer runs just fine
because when there is only the master node online, there is no slots so it
immediately adds node whenever jobs come in. I don't know if there is a
concept of slots in DRMAA or if this version of the loadbalancer uses it
but if so, I think you could reproduce my trick.

I hope it will help you.

Received on Thu Mar 06 2014 - 14:14:11 EST
This archive was generated by hypermail 2.3.0.


Sort all by: