Dear Justin,
Thank you very much for your clear and full answer.
Yes, I completely agree with you that in case of low bound tasks and,
especially, if run them in routine everyday mode the "queuing system" is an
excellent solution. My initial harsh in this question was influenced by the
background where I came from, namely - MPI. I thought, that once user
has available "on demand" cluster computing nodes and MPI,
it eliminates the "queuing system" as a class from the "cloud computing".
Because MPI comes with its own task dispatcher and user can directly acquire
whatever powerful cluster configuration he need for his task, without
waiting for some proper resources will be available. Now, I see that there
are a lot of other applications that had better run in a cluster through
a pre-configured "queuing system", not by hand on a heap of nodes. Thank
you.
And, could I just confirm, once again - "If a single user need to run a MPI
task just from time to time (not on routine everyday basis), would he have
some additional benefits from "queuing system" in a cloud, or it better to
use MPI straightforward"?
Thank you in advance, sincerely yours,
Alexey
On Sat, Oct 23, 2010 at 6:37 PM, Justin Riley <jtriley_at_mit.edu> wrote:
> Alexey,
>
> The Sun Grid Engine queueing system is useful when you have a lot of tasks
> to execute and not just one at a time interactively. For example, you might
> need to convert 300 videos from one format to another. You could either
>
> 1. Write a script that gets the list of nodes from /etc/hosts and then
> loops over the jobs and the nodes, ssh'ing commands to be executed on each
> node. A big problem with this approach is that the task execution and
> management all depends on this script executing successfully all the way
> through. What happens if the script fails? You would then lose all task
> accounting information. Also, what if you suddenly discover you need to do
> another batch of 300 videos while the previous batch is still processing?
> Are you going to re-execute your script and overload the cluster? This would
> definitely slow down all of your jobs. How will you write your script to
> avoid overloading the cluster in this situation without losing the fact that
> you want to submit new jobs *now*?
>
> OR
>
> 2. Skip needing to get the list of nodes and ssh'ing commands to them and
> instead just write a loop that sends 300 jobs to the queuing system using
> "qsub". The queuing system will then do the work to find an available node,
> execute the job, and store it's accounting information (status, start time,
> end time, which node executed the job, etc) . The queuing system will also
> handle load balancing your tasks across the cluster so that any one node
> doesn't get significantly overloaded compared to the other nodes in the
> cluster. If you suddenly discover you need 300 more videos processed you
> could simply "qsub" 300 more jobs. These jobs will be 'queued-up' and
> executed when a node becomes available. This approach reduces your concerns
> to just executing a task on a node rather than managing multiple jobs and
> nodes.
>
> Also it is true that you can create "as many clusters as you want" with
> cloud computing. However, in many cases it could get *very* expensive
> launching multiple clusters for every single task or set of tasks. Whether
> it's more cost effective to launch multiple clusters or just queue a ton of
> jobs on a single cluster depends highly on the sort of tasks you're
> executing.
>
> Of course, just because a queueing system is installed doesn't mean you
> *have* to use it at all. You can of course run things however you want on
> the cluster. Hopefully I've made it clear that there are significant
> advantages to using a queuing system to execute jobs on a cluster rather
> than a home-brewed script.
>
> Hope that helps...
>
> ~Justin
>
>
> On 10/22/10 5:02 PM, Alexey PETROV wrote:
>
> Ye, StartCluster is a great.
> But, what for do we need to use whatever "*queuing system".*
> Surely, in cloud computing, user can create as many clusters as he wants,
> each for his particular tasks.
> So, why?!
>
>
>
> _______________________________________________
> StarCluster mailing list
> StarCluster_at_mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster
>
>
Received on Sat Oct 23 2010 - 14:19:25 EDT