StarCluster - Mailing List Archive

Re: Fast shared or local storage?

From: Cedar McKay <no email>
Date: Fri, 9 May 2014 10:08:52 -0700

Thanks for the very useful reply. I think I'm going to go with the s3fs option and cache to local ephemeral drives. A big blast database is split into many parts, and I'm pretty sure that every file in a blast db isn't read every time, so this way blasting can proceed immediately. The parts of the blast database download from s3 on demand, and cached locally. If there was much writing, I'd probably be reluctant to use this approach because the s3 eventual consistency model seems to require tolerance of write fails at the application level. I'll write my results to a shared nfs volume.

I thought about mpiBlast and will probably explore it, but I read some reports that it's xml output isn't exactly the same as the official NCBI blast output, and may break biopython parsing. I haven't confirmed this, and will probably compare the two techniques.

Thanks again!

Cedar



On May 8, 2014, at 10:56 PM, Rayson Ho <raysonlogin_at_gmail.com> wrote:

> On Thu, May 8, 2014 at 4:46 PM, Cedar McKay <cmckay_at_uw.edu> wrote:
> My use-case is to blast (briefly: blast is an alignment search tool) against a large ~150GB read-only reference database. I'm struggling to figure out how to give each of my nodes access to this database while maximizing performance. The shared volume need only be read-only, but write would be nice too.
>
> Chris from the bioteam (http://bioteam.net/ ) knows much more about Blast , but I will try to answer the questions from an AWS developer point of view.
>
> (BTW, in case you didn't know... in some cases mpiBlast can give you super-linear speedup when the input DB is larger than the main memory of each node.)
>
>
> Does anyone have advice about the best approach?
> My ideas:
> After starting cluster, copy database to ephemeral storage of each node?
> If you put some simple logic in the SGE job script to pull the DB from S3, and store it locally the first time blastall runs on the node (ie. subsequent jobs read from the local copy), then this would give you the best performance and the lowest cost.
>
> * Note that SGE can schedule multiple jobs onto the same node, so you will need some logic to make sure that only 1 transfer is done.
>
> * Most (but not all) instance types give you over 150GB of ephemeral storage that you can read/write without additional cost!
>
> * Note that intra-region S3 to EC2 data transfer is free, but the speed was below 80 MB/s last time we benchmarked it (even with instances that have 1GbE), so the overhead for the initial transfer will be around 30 mins.
>
> * IMO, this is the easiest as you don't need to set anything else up and all you need is a few lines of shell scripting.
>
> Create separate EBS volumes for each node starting from a snapshot containing my reference database. But I don't see a way to automate this.
> Keep in mind that if you need to read 150GB each time a Blast job runs, then it would cost you $0.49 for EBS I/O operations alone. Since main memory can't cache that much data, then you will need to re-read the data from EBS again.
>
> glusterfs. I saw reference to a glusterfs starcluster plugin a while back, but it doesn't seem to be in the current list of plugins.
> IMO, it's too much work if all you need is to read input data.
>
> s3fs. But is random access within a file poor? Even with caching turned on?
> May be the 2nd best option as I assume you will have lots of queued jobs, and the 150GB of input data is read once from S3, and then will be accessed many times locally.
>
> Stick with default approach (nfs share a volume), but provision the headnode for faster networking? Provisioned IOPS EBS volumes? Any other simple optimizations?
> If you just have a few execution (slave) nodes, it would work too. Just create a PIOPS EBS volume, and then mount it & NFS share it by specifying the values in the StarCluster config file:
>
> http://star.mit.edu/cluster/docs/latest/manual/configuration.html#amazon-ebs-volumes
>
> For a larger number of nodes, the NFS server is still the bottleneck. S3 is much more scalable than a single NFS master. I would copy the DB from S3 to the local instance or use s3fs if I have say over 8 (YMMV) nodes in the cluster.
>
> Rayson
>
> ==================================================
> Open Grid Scheduler - The Official Open Source Grid Engine
> http://gridscheduler.sourceforge.net/
> http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html
>
>
>
>
>
> I really appreciate any help.
> Thanks,
> Cedar
>
>
>
>
>
>
> _______________________________________________
> StarCluster mailing list
> StarCluster_at_mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster
>
>
Received on Fri May 09 2014 - 13:08:59 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject