StarCluster - Mailing List Archive

Re: compiling MPI applications on starcluster

From: Justin Riley <no email>
Date: Mon, 28 Apr 2014 10:43:53 -0400

Gonçalo,

Ah, I thought this sounded familiar:

https://github.com/jtriley/StarCluster/issues/370

Thanks for responding. This will be fixed in the upcoming 14.04 AMIs.

Torstein, you can update the MPI links interactively by running the
following commands as root:

$ update-alternatives --config mpi
$ update-alternatives --config mpirun

Select either all openmpi or all mpich paths at the interactive prompts.

~Justin

On Mon, Apr 28, 2014 at 02:06:50PM +0200, Gonçalo Albuquerque wrote:
> Hi,
> When using AMI ami-6b211202 in us-east I stumbled across the same issue
> you're experiencing.
> The symbolic links in the alternatives system are mixing MPICH and
> OpenMPI:
> root_at_master:/etc/alternatives# update-alternatives --display mpi
> mpi - auto mode
>   link currently points to /usr/include/mpich2
> /usr/include/mpich2 - priority 40
>   slave libmpi++.so: /usr/lib/libmpichcxx.so
>   slave libmpi.so: /usr/lib/libmpich.so
>   slave libmpif77.so: /usr/lib/libfmpich.so
>   slave libmpif90.so: /usr/lib/libmpichf90.so
>   slave mpic++: /usr/bin/mpic++.mpich2
>   slave mpic++.1.gz: /usr/share/man/man1/mpic++.mpich2.1.gz
>   slave mpicc: /usr/bin/mpicc.mpich2
>   slave mpicc.1.gz: /usr/share/man/man1/mpicc.mpich2.1.gz
>   slave mpicxx: /usr/bin/mpicxx.mpich2
>   slave mpicxx.1.gz: /usr/share/man/man1/mpicxx.mpich2.1.gz
>   slave mpif77: /usr/bin/mpif77.mpich2
>   slave mpif77.1.gz: /usr/share/man/man1/mpif77.mpich2.1.gz
>   slave mpif90: /usr/bin/mpif90.mpich2
>   slave mpif90.1.gz: /usr/share/man/man1/mpif90.mpich2.1.gz
> /usr/lib/openmpi/include - priority 40
>   slave libmpi++.so: /usr/lib/openmpi/lib/libmpi_cxx.so
>   slave libmpi.so: /usr/lib/openmpi/lib/libmpi.so
>   slave libmpif77.so: /usr/lib/openmpi/lib/libmpi_f77.so
>   slave libmpif90.so: /usr/lib/openmpi/lib/libmpi_f90.so
>   slave mpiCC: /usr/bin/mpic++.openmpi
>   slave mpiCC.1.gz: /usr/share/man/man1/mpiCC.openmpi.1.gz
>   slave mpic++: /usr/bin/mpic++.openmpi
>   slave mpic++.1.gz: /usr/share/man/man1/mpic++.openmpi.1.gz
>   slave mpicc: /usr/bin/mpicc.openmpi
>   slave mpicc.1.gz: /usr/share/man/man1/mpicc.openmpi.1.gz
>   slave mpicxx: /usr/bin/mpic++.openmpi
>   slave mpicxx.1.gz: /usr/share/man/man1/mpicxx.openmpi.1.gz
>   slave mpif77: /usr/bin/mpif77.openmpi
>   slave mpif77.1.gz: /usr/share/man/man1/mpif77.openmpi.1.gz
>   slave mpif90: /usr/bin/mpif90.openmpi
>   slave mpif90.1.gz: /usr/share/man/man1/mpif90.openmpi.1.gz
> Current 'best' version is '/usr/include/mpich2'.
> root_at_master:/etc/alternatives# update-alternatives --display mpirun
> mpirun - auto mode
>   link currently points to /usr/bin/mpirun.openmpi
> /usr/bin/mpirun.mpich2 - priority 40
>   slave mpiexec: /usr/bin/mpiexec.mpich2
>   slave mpiexec.1.gz: /usr/share/man/man1/mpiexec.mpich2.1.gz
>   slave mpirun.1.gz: /usr/share/man/man1/mpirun.mpich2.1.gz
> /usr/bin/mpirun.openmpi - priority 50
>   slave mpiexec: /usr/bin/mpiexec.openmpi
>   slave mpiexec.1.gz: /usr/share/man/man1/mpiexec.openmpi.1.gz
>   slave mpirun.1.gz: /usr/share/man/man1/mpirun.openmpi.1.gz
> Current 'best' version is '/usr/bin/mpirun.openmpi'.
> You do compile it with MPICH and try to run with OpenMPI. The solution is
> to change the symbolic links by using the update-alternatives command. For
> the runtime link (mpirun), it must be done in all the nodes of the
> cluster.
> No doubt this will be corrected in upcoming versions of the AMIs.
> Regards,
> Gonçalo
>
> On Mon, Apr 28, 2014 at 1:09 PM, Torstein Fjermestad
> <[1]tfjermestad_at_gmail.com> wrote:
>
> Dear Justin,
>  
> during the compilation, the cluster only consisted of the master node
> which is of instance type c3.large. In order to run a test parallel
> calculation, I added a node of instance type c3.4xlarge (16 processors).
>
> The cluster is created form the following AMI: 
> [0] ami-044abf73 eu-west-1 starcluster-base-ubuntu-13.04-x86_64 (EBS)
>
> Executing the application outside the queuing system like
>
> mpirun -np 2 -hostfile hosts ./pw.x -in inputfile.inp
>
> did not change anything.
>
> The output of the command "mpirun --version" is the following:
>
> mpirun (Open MPI) 1.4.5
>
> Report bugs to [2]http://www.open-mpi.org/community/help/
>
> After investigating the matter a little bit, I found that mpif90 is
> likely compiled with an MPI version different from mpirun.
> The first line of the output of the command "mpif90 -v" is the
> following:
>  
> mpif90 for MPICH2 version 1.4.1
>
> Furthermore, the output of the command "ldd pw.x" indicates that pw.x is
> compiled with mpich2 and not with Open MPI. The output is the following:
>  
> linux-vdso.so.1 =>  (0x00007fffd35fe000)
>     liblapack.so.3 => /usr/lib/liblapack.so.3 (0x00007ff38fb18000)
>     libopenblas.so.0 => /usr/lib/libopenblas.so.0 (0x00007ff38e2f5000)
>     libmpich.so.3 => /usr/lib/libmpich.so.3 (0x00007ff38df16000)
>     libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
> (0x00007ff38dcf9000)
>     libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3
> (0x00007ff38d9e5000)
>     libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ff38d6df000)
>     libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1
> (0x00007ff38d4c9000)
>     libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff38d100000)
>     librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007ff38cef7000)
>     libcr.so.0 => /usr/lib/libcr.so.0 (0x00007ff38cced000)
>     libmpl.so.1 => /usr/lib/libmpl.so.1 (0x00007ff38cae8000)
>     /lib64/ld-linux-x86-64.so.2 (0x00007ff390820000)
>     libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0
> (0x00007ff38c8b2000)
>     libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ff38c6ae000)
>
> The feedback I got from the Quantum Espresso mailing list suggested that
> the cause of the error could be that pw.x (the executable) was not
> compiled with the same version of mpi as mpirun.
> The output of the commands "mpirun --version", "mpif90 -v" and "ldd
> pw.x" above have lead me to suspect that this is indeed the case.
> I therefore wonder whether it is possible to control which mpi version I
> compile my applications with.
>
> If, with the current mpi installation, the applications are compiled
> with a different mpi version than mpirun, then I will likely have
> similar problems when compiling other applications as well. I would
> therefore very much appreciate if you could give me some hints on how I
> can solve this problem.
>
> Thanks in advance.
>
> Regards,
> Torstein
>
> On Thu, Apr 24, 2014 at 5:13 PM, Justin Riley <[3]jtriley_at_mit.edu>
> wrote:
>
> Hi Torstein,
>
> Can you please describe your cluster configuration (ie size, image
> id(s),
> instance type(s))? Also, you're currently using the SGE/OpenMPI
> integration. Have you tried just using mpirun only as described in the
> first part of:
>
> [4]http://star.mit.edu/cluster/docs/latest/guides/sge.html#submitting-openmpi-jobs-using-a-parallel-environment
>
> Also, what does 'mpirun --version' show?
>
> ~Justin
> On Thu, Apr 17, 2014 at 07:19:28PM +0200, Torstein Fjermestad wrote:
> >    Dear all,
> >
> >    I recently tried to compile an application (Quantum Espresso,
> >    [1][5]http://www.quantum-espresso.org/) to be used for parallel
> computations
> >    on StarCluster. The installation procedure of the application
> consists of
> >    the standard "./configure + make" steps.  At the end of the
> output from
> >    ./configure, the statement "Parallel environment detected
> successfully.\
> >    Configured for compilation of parallel executables." appears.
> >
> >    The compilation with "make" completes without errors. I then run
> the
> >    application in the following way:
> >
> >    I first write a submit script (submit.sh) with the following
> content:
> >
> >    cp /path/to/executable/pw.x .
> >    mpirun ./pw.x -in input.inp
> >    I then submit the job to the queueing system with the following
> command
> >     
> >    qsub -cwd -pe orte 16 ./submit.sh
> >
> >    However, in the output of the calculation, the following line is
> repeated
> >    16 times:
> >
> >    Parallel version (MPI), running on 1 processors
> >
> >    It therefore seems like the program runs 16 1 processor
> calculations that
> >    all write to the same output.
> >
> >    I wrote about this problem to the mailing list of Quantum
> Espresso, and I
> >    got the suggestion that perhaps the mpirun belonged to a
> different MPI
> >    library than pw.x (a particular package of Quantum Espresso) was
> compiled
> >    with.
> >
> >    I compiled pw.x on the same cluster as I executed mpirun. Are
> there
> >    several versions of openMPI on the AMIs provided by StarCluster?
> In that
> >    case, how can I choose the correct one.
> >
> >    Perhaps the problem has a different cause. Does anyone have
> suggestions on
> >    how to solve it?
> >
> >    Thanks in advance for your help.
> >
> >    Yours sincerely,
> >    Torstein Fjermestad
> >
> > References
> >
> >    Visible links
> >    1. [6]http://www.quantum-espresso.org/
>
> > _______________________________________________
> > StarCluster mailing list
> > [7]StarCluster_at_mit.edu
> > [8]http://mailman.mit.edu/mailman/listinfo/starcluster
>
> _______________________________________________
> StarCluster mailing list
> [9]StarCluster_at_mit.edu
> [10]http://mailman.mit.edu/mailman/listinfo/starcluster
>
> References
>
> Visible links
> 1. mailto:tfjermestad_at_gmail.com
> 2. http://www.open-mpi.org/community/help/
> 3. mailto:jtriley_at_mit.edu
> 4. http://star.mit.edu/cluster/docs/latest/guides/sge.html#submitting-openmpi-jobs-using-a-parallel-environment
> 5. http://www.quantum-espresso.org/
> 6. http://www.quantum-espresso.org/
> 7. mailto:StarCluster_at_mit.edu
> 8. http://mailman.mit.edu/mailman/listinfo/starcluster
> 9. mailto:StarCluster_at_mit.edu
> 10. http://mailman.mit.edu/mailman/listinfo/starcluster

> _______________________________________________
> StarCluster mailing list
> StarCluster_at_mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster




Received on Mon Apr 28 2014 - 10:43:56 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject