1) Typo: the help for the '-n' option in the 'get' command should say
"Transfer files *from* NODE" rather than "to NODE".
2) I don't know about other Starcluster AMI's, but at least ami-4583572c
has *both* openmpi and mpich2 installed, with /etc/alternatives set up
to point to openmpi as the default. I presume the inclusion of both is
intentional, but it can cause some problems, or at least confusion,
because the documentation for the MPICH2 Plugin says, "By default
StarCluster includes OpenMPI. However, for either performance or
compatibility reasons, you may wish to use MPICH2 which provides an
alternate MPI implementation. The MPICH2 plugin will install and
configure MPICH2 on your clusters"
would easily assume from this documentation that the default state of
the AMI does *not* include an installation of MPICH2 at all, when in
fact MPICH2 actually is *installed*, it is just not configured to be the
dominant binary called from the command line.
What happened to me was that I was running some software (CADO-NFS for
integer factorization) which has its own internal mechanism for
detecting which brand of MPI is being used. When the perl code found
that /usr/bin/mpich2version existed on the system, it assumed (quite
understandably so) that /usr/bin/mpiexec would be MPICH2, but in reality
/usr/bin/mpiexec was pointing to /usr/bin/mpiexec.openmpi. This caused
the factorization software to crash because it was setting up a faulty
command line using MPICH2-specific flags. Since I intentionally had NOT
activated the mpich2 plugin in the Starcluster config file, it took me
several minutes of confusion to get sorted out what was going on (I
fixed the problem simply by purging mpich2). I have reported this "bug"
to the developers of the factorization software itself, but thought that
you guys also might want to take note of the apparent discrepancy
between the docs and the installation.
3) Please add "localhost" to the ssh known_host list for all accounts
set up on all nodes, unless there is some reason not to.
4) Also on the topic of ssh configuration: shouldn't the directory
/home/$CLUSTER_USER/.ssh belong to CLUSTER_USER himself? From what I
have seen, the present setup makes the files *inside* .ssh/ owned by
CLUSTER_USER, but this directory itself belongs to root, which forces an
extra step of work if I want to set up an ssh config file for CLUSTER_USER.
5) The help for the sshnode command claims the ability to use
"shorthand", such that the user can call
starcluster sn mycluster 1 instead of starcluster sn mycluster node001.
However, while the longhand command in this example works just fine for
me, if I try the abbreviated version I get "!!! ERROR - node '1' does
I'll be glad to supply more details if any of the above are specific
to me rather than general issues.
Finally, thanks and congrats on a great product (nitpicks aside).
Starcluster gets an awful lot done from a profoundly simple interface,
with clear and nicely presented documentation, so that I can focus on my
interest area of actually *running* my programs rather than on the
mechanics of setting up clusters in order to be able to run them. I see
that a lot of folks are using Starcluster for biological applications,
etc., and as a mathematician I would testify that Starcluster is an
important tool to have in the arsenal for factoring large numbers as well!
Received on Wed Jul 25 2012 - 11:39:12 EDT