I've since terminated the cluster and an experimenting with different set
up, but here's the output from qstat and qhost;
ec2-user_at_master:~$ qstat
job-ID prior name user state submit/start at queue
slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
4 0.55500 j1-00493-0 ec2-user r 09/18/2013 17:38:44
all.q_at_node001 8
6 0.55500 j1-00508-0 ec2-user r 09/18/2013 17:45:44
all.q_at_node002 8
7 0.55500 j1-00525-0 ec2-user r 09/18/2013 17:46:29
all.q_at_node003 8
8 0.55500 j1-00541-0 ec2-user r 09/18/2013 17:54:59
all.q_at_node004 8
9 0.55500 j1-00565-0 ec2-user r 09/18/2013 17:55:44
all.q_at_node005 8
10 0.55500 j1-00596-0 ec2-user r 09/18/2013 17:58:59
all.q_at_node006 8
11 0.55500 j1-00604-0 ec2-user r 09/18/2013 18:05:14
all.q_at_node007 8
12 0.55500 j1-00625-0 ec2-user r 09/18/2013 18:05:14
all.q_at_node008 8
13 0.55500 j1-00650-0 ec2-user r 09/18/2013 18:05:14
all.q_at_node009 8
18 0.55500 j1-00734-0 ec2-user r 09/18/2013 18:07:29
all.q_at_node010 8
19 0.55500 j1-00738-0 ec2-user r 09/18/2013 18:16:59
all.q_at_node011 8
20 0.55500 j1-00739-0 ec2-user r 09/18/2013 18:16:59
all.q_at_node012 8
21 0.55500 j1-00770 ec2-user r 09/18/2013 18:16:59
all.q_at_node013 8
22 0.55500 j1-00806-0 ec2-user r 09/18/2013 18:16:59
all.q_at_node014 8
23 0.55500 j1-00825-0 ec2-user r 09/18/2013 18:16:59
all.q_at_node015 8
24 0.55500 j1-00826-0 ec2-user r 09/18/2013 18:16:59
all.q_at_node016 8
25 0.55500 j1-00846-0 ec2-user r 09/18/2013 18:16:59
all.q_at_node017 8
26 0.55500 j1-00847-0 ec2-user r 09/18/2013 18:16:59
all.q_at_node018 8
27 0.55500 j1-00913 ec2-user r 09/18/2013 18:16:59
all.q_at_node019 8
28 0.55500 j1-00914-0 ec2-user r 09/18/2013 18:16:59
all.q_at_node020 8
29 0.55500 j1-00914 ec2-user r 09/18/2013 18:26:29
all.q_at_node021 8
30 0.55500 j1-00922 ec2-user r 09/18/2013 18:26:29
all.q_at_node022 8
31 0.55500 j1-00977 ec2-user r 09/18/2013 18:26:29
all.q_at_node023 8
32 0.55500 j1-00984-0 ec2-user r 09/18/2013 18:26:29
all.q_at_node024 8
33 0.55500 j1-00984 ec2-user r 09/18/2013 18:26:29
all.q_at_node025 8
34 0.55500 j1-00998-0 ec2-user r 09/18/2013 18:26:29
all.q_at_node026 8
35 0.55500 j1-01010-0 ec2-user r 09/18/2013 18:26:29
all.q_at_node027 8
36 0.55500 j1-01019-0 ec2-user r 09/18/2013 18:26:29
all.q_at_node028 8
37 0.55500 j1-01025-0 ec2-user r 09/18/2013 18:26:29
all.q_at_node029 8
38 0.55500 j1-01026-0 ec2-user r 09/18/2013 18:26:29
all.q_at_node030 8
ec2-user_at_master:~$ qhost
HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO
SWAPUS
-------------------------------------------------------------------------------
global - - - - - -
-
node001 linux-x64 8 7.74 6.8G 3.8G 0.0
0.0
node002 linux-x64 8 7.93 6.8G 3.7G 0.0
0.0
node003 linux-x64 8 7.68 6.8G 3.7G 0.0
0.0
node004 linux-x64 8 7.86 6.8G 3.8G 0.0
0.0
node005 linux-x64 8 7.87 6.8G 3.7G 0.0
0.0
node006 linux-x64 8 7.66 6.8G 3.7G 0.0
0.0
node007 linux-x64 8 0.01 6.8G 564.8M 0.0
0.0
node008 linux-x64 8 0.01 6.8G 493.6M 0.0
0.0
node009 linux-x64 8 0.02 6.8G 564.4M 0.0
0.0
node010 linux-x64 8 7.85 6.8G 3.7G 0.0
0.0
node011 linux-x64 8 7.53 6.8G 3.7G 0.0
0.0
node012 linux-x64 8 7.57 6.8G 3.6G 0.0
0.0
node013 linux-x64 8 7.71 6.8G 3.7G 0.0
0.0
node014 linux-x64 8 7.49 6.8G 3.7G 0.0
0.0
node015 linux-x64 8 7.51 6.8G 3.7G 0.0
0.0
node016 linux-x64 8 7.50 6.8G 3.6G 0.0
0.0
node017 linux-x64 8 7.89 6.8G 3.7G 0.0
0.0
node018 linux-x64 8 7.50 6.8G 3.7G 0.0
0.0
node019 linux-x64 8 7.52 6.8G 3.7G 0.0
0.0
node020 linux-x64 8 7.68 6.8G 3.6G 0.0
0.0
node021 linux-x64 8 7.16 6.8G 3.6G 0.0
0.0
node022 linux-x64 8 6.99 6.8G 3.6G 0.0
0.0
node023 linux-x64 8 6.80 6.8G 3.6G 0.0
0.0
node024 linux-x64 8 7.20 6.8G 3.6G 0.0
0.0
node025 linux-x64 8 6.86 6.8G 3.6G 0.0
0.0
node026 linux-x64 8 7.24 6.8G 3.6G 0.0
0.0
node027 linux-x64 8 6.88 6.8G 3.7G 0.0
0.0
node028 linux-x64 8 6.28 6.8G 3.6G 0.0
0.0
node029 linux-x64 8 7.42 6.8G 3.6G 0.0
0.0
node030 linux-x64 8 0.10 6.8G 390.4M 0.0
0.0
node031 linux-x64 8 0.06 6.8G 135.0M 0.0
0.0
node032 linux-x64 8 0.04 6.8G 135.3M 0.0
0.0
node033 linux-x64 8 0.07 6.8G 135.6M 0.0
0.0
node034 linux-x64 8 0.10 6.8G 134.9M 0.0
0.0
I never saw anything unusual
On Wed, Sep 18, 2013 at 10:40 AM, Rajat Banerjee <rajatb_at_post.harvard.edu>wrote:
> Ryan,
> Could you put the output of qhost and qstat into a text file and send it
> back to the list? That's what feeds the load balancer those stats.
>
> Thanks,
> Rajat
>
>
> On Tue, Sep 17, 2013 at 11:47 PM, Ryan Golhar <ngsbioinformatics_at_gmail.com
> > wrote:
>
>> I'm running a cluster with over 800 jobs queued....and I'm running
>> loadbalance. Every other query by loadbalance shows Avg job duration and
>> wait time of 0 secs. Why is this? It hasn't yet caused a problem, but
>> seems odd....
>>
>> >>> Loading full job history
>> Execution hosts: 19
>> Queued jobs: 791
>> Oldest queued job: 2013-09-17 22:19:23
>> Avg job duration: 3559 secs
>> Avg job wait time: 12389 secs
>> Last cluster modification time: 2013-09-18 00:11:31
>> >>> Not adding nodes: already at or above maximum (1)
>> >>> Sleeping...(looping again in 60 secs)
>>
>> Execution hosts: 19
>> Queued jobs: 791
>> Oldest queued job: 2013-09-17 22:19:23
>> Avg job duration: 0 secs
>> Avg job wait time: 0 secs
>> Last cluster modification time: 2013-09-18 00:11:31
>> >>> Not adding nodes: already at or above maximum (1)
>> >>> Sleeping...(looping again in 60 secs)
>>
>>
>>
>> _______________________________________________
>> StarCluster mailing list
>> StarCluster_at_mit.edu
>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>
>>
>
Received on Wed Sep 18 2013 - 14:47:54 EDT