StarCluster - Mailing List Archive

Re: Starcluster debug file

From: Paolo Di Tommaso <no email>
Date: Wed, 21 Dec 2011 15:02:49 +0100

I've made some benchmarks some weeks ago about large cluster deployment with StarCluster.

I reported the result in this mailing list, but it could be interesting to share it again.

It turned out that StarCluster takes 15 minutes every 50 nodes to be launched (micro instances). Something like:

 - 50 nodes: boot time ~ 15 minutes
 - 100 nodes: ~ 30 minutes
 - 200 nodes: ~ 1 hour

This linear progression made me think that StarCluster uses a serial mechanism to start the instances. But it is only a speculation and cannot say more.

Anyway solving this problem would be a huge improvement for StarCluster.


Cheers,
Paolo





On Dec 21, 2011, at 12:25 PM, Sumita Sinha wrote:

Hello Justin,

I restarted a 200 node cluster and this time the step of Installing Sun Grid Engine was taking a lot of time.

>>> Installing Sun Grid Engine
197/199 |///////////////////////////////////////////////////////////////| 98%

I did a tail to the ~/.starcluster/logs/debug.log file and could see the below line getting repeated .

2011-12-21 11:09:06,152 PID: 17885 threadpool.py:136 - DEBUG - unfinished_tasks = 2

I waited for almost 50 minutes and then had to terminate the cluster.



--
Regards
Sumita Sinha
_______________________________________________
StarCluster mailing list
StarCluster_at_mit.edu<mailto:StarCluster_at_mit.edu>
http://mailman.mit.edu/mailman/listinfo/starcluster
Received on Wed Dec 21 2011 - 09:02:54 EST
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject