From: Dan Yamins
Date: Wed, 4 Apr 2012 08:57:01 -0400

1) What is the format of your data? And how big are the entries? I like
MongoDB for this sort of thing, but it may depend on what kind thing you
want to store.

2) ssh tunnels maybe a good solution for having a common DB backing the
cluster. Basically, if you use a DB that is accessible as a service on a
port, then if you ssh tunnel from the various worker nodes to the node
running the DB, software running on the worker nodes can act "as if" the
database were purely local.

 In other words, do three things

    A) set up a single DB actually running on one designated node, one some
port. e.g. port 27017 on master.

    B) write code in your worker that pretends the DB is local on the port
(here's pythonesque code for mongoDB):

     connection = pymongo.connection(host='localhost', port=27017)
     collection = conn['my_database']['my_collection']

     C) and then separately establish an ssh tunnel from the worker node
to the master (or wherever the single DB is running). This can be done in
a starcluster plugin in the "add_node" or "run" methods like this:

          workernode.ssh.execute("ssh -f -N -L 27017t:localhost:27017

Of course you could start this by hand on all the nodes as well, but that
gets a little tedious, and the plugin system is perfect for this kind of

Having done A), B), and C), when you run the code in B) on your worker
node, the code will simple read and write to the single master database
from A) without having to know anything about the fact that's running on a

On Tue, Apr 3, 2012 at 11:22 PM, Chris Diehl wrote:

> Hello,
> I would like to use StarCluster to do some web scrapping and I'd like to
> store the collected data in a DB that is available to all of the cluster
> nodes. Is there a way to have a common DB backing the entire cluster? Any
> particular DBs that anyone has had success with?
> Thanks for your assistance!
> Chris
Received on Wed Apr 04 2012 - 09:02:14 EDT
