Fast shared or local storage?
Hello, first thank you for creating this amazing product. I'm amazed by how easy it is to set up a working cluster in short order. Thanks!
I hope that you will be able to give me some advice, since I'm new to both AWS and Starcluster.
My use-case is to blast (briefly: blast is an alignment search tool) against a large ~150GB read-only reference database. I'm struggling to figure out how to give each of my nodes access to this database while maximizing performance. The shared volume need only be read-only, but write would be nice too.
The default approach seems to be to mount an EBS volume on the master, put the database on there, and then all the nodes access it via NFS. Very straight forward. But I fear performance will be limited either by the EBS itself, or by the network connection between the master and nodes, since many nodes will be accessing the same files at the same time.
Does anyone have advice about the best approach?
My ideas:
After starting cluster, copy database to ephemeral storage of each node?
Create separate EBS volumes for each node starting from a snapshot containing my reference database. But I don't see a way to automate this.
glusterfs. I saw reference to a glusterfs starcluster plugin a while back, but it doesn't seem to be in the current list of plugins.
s3fs. But is random access within a file poor? Even with caching turned on?
Stick with default approach (nfs share a volume), but provision the headnode for faster networking? Provisioned IOPS EBS volumes? Any other simple optimizations?
I really appreciate any help.
Thanks,
Cedar
Received on Thu May 08 2014 - 16:46:06 EDT
This archive was generated by
hypermail 2.3.0.