I've made some benchmarks some weeks ago about large cluster deployment with StarCluster.
I reported the result in this mailing list, but it could be interesting to share it again.
It turned out that StarCluster takes 15 minutes every 50 nodes to be launched (micro instances). Something like:
- 50 nodes: boot time ~ 15 minutes
- 100 nodes: ~ 30 minutes
- 200 nodes: ~ 1 hour
This linear progression made me think that StarCluster uses a serial mechanism to start the instances. But it is only a speculation and cannot say more.
Anyway solving this problem would be a huge improvement for StarCluster.
On Dec 21, 2011, at 12:25 PM, Sumita Sinha wrote:
I restarted a 200 node cluster and this time the step of Installing Sun Grid Engine was taking a lot of time.
>>> Installing Sun Grid Engine
197/199 |///////////////////////////////////////////////////////////////| 98%
I did a tail to the ~/.starcluster/logs/debug.log file and could see the below line getting repeated .
2011-12-21 11:09:06,152 PID: 17885 threadpool.py:136 - DEBUG - unfinished_tasks = 2
I waited for almost 50 minutes and then had to terminate the cluster.
StarCluster mailing list
Received on Wed Dec 21 2011 - 09:02:54 EST