StarCluster - Mailing List Archive

Eqw errors in SGE with default starcluster configuration

From: Josh Moore <no email>
Date: Tue, 7 Feb 2012 02:30:15 -0500

I tried submitting a bunch of jobs using qsub with a script that works fine
on another (non-Amazon) cluster's configuration of SGE. But on a cluster
configured with StarCluster, only the first 8 (on a cluster of c1.xlarge
nodes, so 8 cores each) enter the queue without error (all of those are
immediately executed on the master node). Even if I delete one of the jobs
on the master node, another one never takes its place. I have a cluster of
8 c1.xlarge nodes. Here is the output of qconf -ssconf:

algorithm default
schedule_interval 0:0:15
maxujobs 0
queue_sort_method load
job_load_adjustments np_load_avg=0.50
load_adjustment_decay_time 0:7:30
load_formula np_load_avg
schedd_job_info false
flush_submit_sec 0
flush_finish_sec 0
params none
reprioritize_interval 0:0:0
halftime 168
usage_weight_list cpu=1.000000,mem=0.000000,io=0.000000
compensation_factor 5.000000
weight_user 0.250000
weight_project 0.250000
weight_department 0.250000
weight_job 0.250000
weight_tickets_functional 0
weight_tickets_share 0
share_override_tickets TRUE
share_functional_shares TRUE
max_functional_jobs_to_schedule 200
report_pjob_tickets TRUE
max_pending_tasks_per_job 50
halflife_decay_list none
policy_hierarchy OFS
weight_ticket 0.010000
weight_waiting_time 0.000000
weight_deadline 3600000.000000
weight_urgency 0.100000
weight_priority 1.000000
max_reservation 0
default_duration INFINITY

I can't figure out how to change schedd_job_info to true to find out more
about the error message...
