Re: Eqw errors in SGE with default starcluster configuration
I'd check out /opt/sge6/default/spool/qmaster/messages to see if there is anything useful about what is happening there. It will generally tell you why its not queuing an additional job. Are the parallel environments setup the same between your two clusters?
Dustin
On Feb 7, 2012, at 2:30 AM, Josh Moore wrote:
> I tried submitting a bunch of jobs using qsub with a script that works fine on another (non-Amazon) cluster's configuration of SGE. But on a cluster configured with StarCluster, only the first 8 (on a cluster of c1.xlarge nodes, so 8 cores each) enter the queue without error (all of those are immediately executed on the master node). Even if I delete one of the jobs on the master node, another one never takes its place. I have a cluster of 8 c1.xlarge nodes. Here is the output of qconf -ssconf:
>
> algorithm default
> schedule_interval 0:0:15
> maxujobs 0
> queue_sort_method load
> job_load_adjustments np_load_avg=0.50
> load_adjustment_decay_time 0:7:30
> load_formula np_load_avg
> schedd_job_info false
> flush_submit_sec 0
> flush_finish_sec 0
> params none
> reprioritize_interval 0:0:0
> halftime 168
> usage_weight_list cpu=1.000000,mem=0.000000,io=0.000000
> compensation_factor 5.000000
> weight_user 0.250000
> weight_project 0.250000
> weight_department 0.250000
> weight_job 0.250000
> weight_tickets_functional 0
> weight_tickets_share 0
> share_override_tickets TRUE
> share_functional_shares TRUE
> max_functional_jobs_to_schedule 200
> report_pjob_tickets TRUE
> max_pending_tasks_per_job 50
> halflife_decay_list none
> policy_hierarchy OFS
> weight_ticket 0.010000
> weight_waiting_time 0.000000
> weight_deadline 3600000.000000
> weight_urgency 0.100000
> weight_priority 1.000000
> max_reservation 0
> default_duration INFINITY
>
> I can't figure out how to change schedd_job_info to true to find out more about the error message...
> _______________________________________________
> StarCluster mailing list
> StarCluster_at_mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster
Received on Tue Feb 07 2012 - 11:03:03 EST
This archive was generated by
hypermail 2.3.0.