Classification: UNCLASSIFIED
Caveats: FOUO
Ron,
I just forwarded the entire e-mail to the list. For some reason, my messages to starcluster_at_mit.edu are being held until a moderator decides to accept it for the reason "Post by non-member to a members-only list", but I joined the list a couple of days ago. Perhaps it takes some time for my membership to become active? The message I received from "starcluster-bounces_at_mit.edu" is attached.
Tom Oppe
-----Original Message-----
From: Ron Chen [mailto:ron_chen_123_at_yahoo.com]
Sent: Monday, November 26, 2012 8:32 PM
To: Oppe, Thomas C ERDC-RDE-ITL-MS Contractor
Subject: Re: [StarCluster] Installing Intel compilers on Starcluster (UNCLASSIFIED)
Please also include the list.
-Ron
----- Original Message -----
From: "Oppe, Thomas C ERDC-RDE-ITL-MS Contractor" <Thomas.C.Oppe_at_erdc.dren.mil>
To: Ron Chen <ron_chen_123_at_yahoo.com>
Cc:
Sent: Monday, November 26, 2012 5:55 PM
Subject: RE: [StarCluster] Installing Intel compilers on Starcluster (UNCLASSIFIED)
Classification: UNCLASSIFIED
Caveats: FOUO
Ron,
Thank you for the pointers. Indeed, root did not have passwordless "ssh" access to "node001", but user "admin" did, after I did:
su - admin
where "admin" was one of my CLUSTER_USER names:
# create the following user on the cluster CLUSTER_USER = admin
I was able to install the Intel Cluster compiler suite on "master". Unfortunately, I terminated the cluster late last night when I could not grow it beyond the two nodes "master" and "node001". The command
starcluster addnode -n 19 myCluster
seemed to hang at the step "verifying that ssh is running on all nodes". In retrospect, this problem may have been related to the same ssh problem that root had.
Thus we are back at square 1, trying to bring up a 2-node cluster.
Attached is a train of e-mails between me and a colleague, Mrs. Carrie Leach, about our troubles. Our biggest problem is not being able to bring up the cluster again. Other problems, minor at this time, are:
(1) If codes are compiled by the Intel compilers using the AVX architecture flags "-xAVX", "-xCORE-AVX-I", or "-xCORE-AVX2", they refuse to run on "cc2.8xlarge" instances even though these instances are SandyBridge and thus should have the AVX instruction set. "-xHost" results in a usable executable, but it seems to be the same as "-xSSE4.2" which is for the Nehalem chip. So perhaps the 30-day Intel license is not the full blown compiler?
(2) The most recent GNU compilers on "cc2.8xlarge" instances seems to be 4.4.0 for "gcc" and "g++" and 4.1.x for "gfortran". Is there a way to load the most recent GNU compilers (4.7.2) using a plugin or some package that StarCluster understands? I have downloaded the GNU 4.7.2 tar file from one of the GNU mirror sites, but it seems to be a major effort to install it.
If after reading all this, you have any advice, I would appreciate your help.
Thank you.
Tom Oppe
-----Original Message-----
From: Ron Chen [mailto:ron_chen_123_at_yahoo.com]
Sent: Monday, November 26, 2012 3:30 PM
To: Oppe, Thomas C ERDC-RDE-ITL-MS Contractor; starcluster_at_mit.edu
Subject: Re: [StarCluster] Installing Intel compilers on Starcluster (UNCLASSIFIED)
Does passwordless ssh work?
http://star.mit.edu/cluster/docs/latest/manual/getting_started.html#verify-passwordless-ssh
Also, see this Intel MPI FAQ:
http://software.intel.com/en-us/articles/intel-cluster-toolkit-installation-faq/#5
-Ron
________________________________
From: "Oppe, Thomas C ERDC-RDE-ITL-MS Contractor" <Thomas.C.Oppe_at_erdc.dren.mil>
To: "starcluster_at_mit.edu" <starcluster_at_mit.edu>
Sent: Saturday, November 24, 2012 6:56 AM
Subject: [StarCluster] Installing Intel compilers on Starcluster (UNCLASSIFIED)
Classification: UNCLASSIFIED
Caveats: FOUO
Dear Sir:
I am trying to install the Intel compilers on a Starcluster cluster on the Amazon Cloud using a free 30-day trial license. Basically I would like to take advantage of the AVX instruction set that the SandyBridge processors use on “cc2.8xlarge” instances. I was wondering if anyone had experience in installing the Intel compilers.
If an installation is successful, will I be using Intel MPI or OpenMPI?
I am starting with a 2-node cluster, so I created a “machines.LINUX” file with my two node names:
master
node001
but when I get to the command:
./sshconnectivity.exp machines.LINUX
it asks me for a password for “node001” which I don’t know. Then I tried to reduce the “machines.LINUX” file to just
master
and the installation “worked” in the sense that no error messages resulted. However I am worried that the MPI executables that I create with “mpiifort”, “mpiicc”, etc. will only run on the “master” node. I will be benchmarking the Amazon Cloud across a wide range of process counts, so it seems unreasonable to have to install the Intel compilers for each size cluster.
Should I close down the “node001” instance, and then add the desired number of nodes using
starcluster addnode -n <# additional nodes> mycluster
and starcluster will duplicate what I have done on “master”?
Finally, will I need a hostlist file (or “mpd.hosts” file) to run my MPI benchmarks under SGE? What is the format of the “mpd.hosts” file?
Thank you for any information.
Please excuse my novice-level questions. I appreciate any help.
Tom Oppe
Classification: UNCLASSIFIED
Caveats: FOUO
_______________________________________________
StarCluster mailing list
StarCluster_at_mit.edu
http://mailman.mit.edu/mailman/listinfo/starcluster
Classification: UNCLASSIFIED
Caveats: FOUO
Classification: UNCLASSIFIED
Caveats: FOUO
Carrie,
I'm going home, I'm so tired after having stayed up all night.
Also, a disaster occurred. I tried to grow the cluster, but it was taking so long that I control-C-ed out of it. Then I took down the added nodes, then "node001", then "master" (dumb), and then "cluster" (DUMB!). Now I can't recall how to start up a cluster of size 2.
If you will look at the EC2 console, we have a new "m1.small" instance called "cluster":
ec2-50-17-120-236.compute-1.amazonaws.com
From that node's home directory, "~/.starcluster/config" is supposed to contain our cluster template "config" but I can't get the cluster started.
ec2-user_at_ip-10-244-157-167 > starcluster start myCluster
StarCluster - (
http://star.mit.edu/cluster) (v. 0.9999)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster_at_mit.edu
!!! ERROR - Unable to load plugins: no master node found
!!! ERROR - No master node found!
ec2-user_at_ip-10-244-157-167 >
Our EBS volume is safe, I think.
If you can figure out how to restart our cluster, I would appreciate it. I am going home.
Starcluster documentation:
http://star.mit.edu/cluster/docs/latest/index.html#
With me you have to take the disasters along with the small victories.
I'm very sorry.
Tom Oppe
-----Original Message-----
From: Leach, Carrie L Mrs CTR USA USACE USA [mailto:carrie.l.leach2.ctr_at_us.army.mil]
Sent: Monday, November 26, 2012 6:01 AM
To: Oppe, Thomas C ERDC-RDE-ITL-MS Contractor
Subject: RE: Status on Amazon Cloud benchmarking (UNCLASSIFIED)
UNCLASSIFIED
That's awesome. I never would have thought to su to admin.
On 12.11.25, "Oppe, Thomas C ERDC-RDE-ITL-MS Contractor" wrote:
> Classification: UNCLASSIFIED
> Caveats: FOUO
>
> Carrie,
>
> Well, interesting. If I change to user "admin" instead of root, all works fine.
>
> root_at_master > ssh node001 hostname
> Permission denied (publickey,gssapi-with-mic).
>
> root_at_master > su - admin
> [admin_at_master ~]$ ssh node001 hostname
> node001
>
> So I have password-less login to node001 as user "admin" but not as "root". Go figure.
>
> admin_at_master > mpdboot</x> -v -r ssh -n 2 -f /sharedWork/mpd.hosts
> running mpdallexit on master LAUNCHED mpd on master via
> RUNNING: mpd on master
> LAUNCHED mpd on node001 via master
> RUNNING: mpd on node001
>
> admin_at_master > mpiexec -ppn 16 -n 32 ./a.out
> a(100) = 100.00000
> a(100) = 100.00000
> a(100) = 100.00000
> a(100) = 100.00000
> a(100) = 100.00000
> a(100) = 100.00000
> a(100) = 100.00000
> a(100) = 100.00000
> a(100) = 100.00000
> a(100) = 100.00000
> a(100) = 100.00000
> a(100) = 100.00000
> a(100) = 100.00000
> a(100) = 100.00000
> a(100) = 100.00000
> a(100) = 100.00000
> a(100) = 100.00000
> a(100) = 100.00000
> a(100) = 100.00000
> a(100) = 100.00000
> a(100) = 100.00000
> a(100) = 100.00000
> a(100) = 100.00000
> a(100) = 100.00000
> a(100) = 100.00000
> a(100) = 100.00000
> a(100) = 100.00000
> a(100) = 100.00000
> a(100) = 100.00000
> a(100) = 100.00000
> a(100) = 100.00000
> a(100) = 100.00000
> ++++ ABTP ++++: *** Begin ABTP Timing Report *** ABTP ++++:
> ++++ ABTP ++++: MPI_Wtime Resolution, MPI_Wtick() = 0.100000E-05 ABTP
> ++++ ++++: MPI_Wtime Synchronization, MPI_WTIME_IS_GLOBAL = F ABTP
> ++++ ++++:
> ++++ ABTP ++++: Max MPI_Init timer val = 1353922369.829603 at rank 17
> ++++ ABTP ++++: Min MPI_Init timer val = 1353908878.308468 at rank 11
> ++++ ABTP ++++: Avg MPI_Init timer val = 1353915624.069039 ABTP ++++:
> ++++ ABTP ++++: Max MPI_Fin timer val = 1353922369.832097 at rank 24
> ++++ ABTP ++++: Min MPI_Fin timer val = 1353908878.310081 at rank 11
> ++++ ABTP ++++: Avg MPI_Fin timer val = 1353915624.071303 ABTP ++++:
> ++++ ABTP ++++: Max Fin-Init timer val = 0.003390 at rank 0 ABTP ++++:
> ++++ Min Fin-Init timer val = 0.001613 at rank 11 ABTP ++++: Avg
> ++++ Fin-Init timer val = 0.002264 ABTP ++++:
> ++++ ABTP ++++: Max Fin - Min Init val = 13491.523629 ABTP ++++: Min
> ++++ Fin - Max Init val = -13491.519522 ABTP ++++:
> ++++ ABTP ++++: Rank Elapsed Time MPI_Init MPI_Finalize Processor Name
> ++++ ABTP ++++: 0 0.003390 1353908878.308641 1353908878.312031 master
> ++++ ABTP ++++: 1 0.003137 1353908878.308545 1353908878.311682 master
> ++++ ABTP ++++: 2 0.002946 1353908878.308626 1353908878.311572 master
> ++++ ABTP ++++: 3 0.002571 1353908878.308529 1353908878.311100 master
> ++++ ABTP ++++: 4 0.002752 1353908878.308592 1353908878.311344 master
> ++++ ABTP ++++: 5 0.002595 1353908878.308582 1353908878.311177 master
> ++++ ABTP ++++: 6 0.002645 1353908878.308588 1353908878.311233 master
> ++++ ABTP ++++: 7 0.002665 1353908878.308529 1353908878.311194 master
> ++++ ABTP ++++: 8 0.001753 1353908878.308600 1353908878.310353 master
> ++++ ABTP ++++: 9 0.001735 1353908878.308540 1353908878.310275 master
> ++++ ABTP ++++: 10 0.001722 1353908878.308610 1353908878.310332 master
> ++++ ABTP ++++: 11 0.001613 1353908878.308468 1353908878.310081 master
> ++++ ABTP ++++: 12 0.002039 1353908878.308610 1353908878.310649 master
> ++++ ABTP ++++: 13 0.001839 1353908878.308533 1353908878.310372 master
> ++++ ABTP ++++: 14 0.001717 1353908878.308594 1353908878.310311 master
> ++++ ABTP ++++: 15 0.002134 1353908878.308605 1353908878.310739 master
> ++++ ABTP ++++: 16 0.001853 1353922369.829558 1353922369.831411
> ++++ node001 ABTP ++++: 17 0.001949 1353922369.829603
> ++++ 1353922369.831552 node001 ABTP ++++: 18 0.001635
> ++++ 1353922369.829506 1353922369.831141 node001 ABTP ++++: 19
> ++++ 0.002061 1353922369.829544 1353922369.831605 node001 ABTP ++++:
> ++++ 20 0.001746 1353922369.829492 1353922369.831238 node001 ABTP
> ++++ ++++: 21 0.002136 1353922369.829491 1353922369.831627 node001
> ++++ ABTP ++++: 22 0.001692 1353922369.829494 1353922369.831186
> ++++ node001 ABTP ++++: 23 0.001997 1353922369.829459
> ++++ 1353922369.831456 node001 ABTP ++++: 24 0.002604
> ++++ 1353922369.829493 1353922369.832097 node001 ABTP ++++: 25
> ++++ 0.002598 1353922369.829456 1353922369.832054 node001 ABTP ++++:
> ++++ 26 0.002543 1353922369.829501 1353922369.832044 node001 ABTP
> ++++ ++++: 27 0.002455 1353922369.829525 1353922369.831980 node001
> ++++ ABTP ++++: 28 0.002457 1353922369.829500 1353922369.831957
> ++++ node001 ABTP ++++: 29 0.002500 1353922369.829458
> ++++ 1353922369.831958 node001 ABTP ++++: 30 0.002472 1353922369.829491 1353922369.831963 node001 ABTP ++++: 31 0.002491 1353922369.829493 1353922369.831984 node001 ABTP ++++:
> ++++ ABTP ++++: ABTP_Timer Overhead (inside MPI) from Rank 0 =
> ++++ 0.005134 ABTP ++++: ABTP_Timer and MPI_Finalize Overhead from
> ++++ Rank 0 = 0.021000 ABTP ++++:
> ++++ ABTP ++++:
> ++++ ABTP ++++: *** End ABTP Timing Report ***
>
> The first 16 MPi processes ran on "master", and the second 16 processes ran on "node001".
>
> Tom Oppe
>
>
> -----Original Message-----
> From: Leach, Carrie L Mrs CTR USA USACE USA
> [mailto:carrie.l.leach2.ctr_at_us.army.mil](javascript:main.compose()
> Sent: Sunday, November 25, 2012 11:10 PM
> To: Oppe, Thomas C ERDC-RDE-ITL-MS Contractor
> Subject: RE: Status on Amazon Cloud benchmarking (UNCLASSIFIED)
>
> UNCLASSIFIED
> How strange that you can't get to the AVX instruction set. Did they give you a POC when you downloaded the compiler software from intel?
> -Carrie
>
> On 12.11.25, "Oppe, Thomas C ERDC-RDE-ITL-MS Contractor" wrote:
> > Classification: UNCLASSIFIED
> > Caveats: FOUO
> >
> > Carrie,
> >
> > I don't know. The command below shows (by "sse4_2") that SSE4.2 is the most advanced instruction possible.
> >
> > root_at_master > cat /proc/cpuinfo</x> | grep -m 1 flags flags : fpu
> > vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
> > clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc pni ssse3
> > cx16 sse4_1
> > sse4_2 popcnt lahf_lm
> >
> > but http://ark.intel.com/products/64595
> >
> > shows that AVX instructions are available with the E5-2670 chip on the cc2.8xlarge instances.
> >
> > I can fill in some more times from the table given a couple days ago.
> >
> > The HT run was faster than the ST run for the "T160" test case, but slower than the ST run for the "au" test case. But the same behavior was seen on Diamond.
> >
> > ST = 1 MPI proc/core, so 32 procs total on two EC2 nodes HT = 2 MPI
> > procs/core, so 64 procs total on two EC2 nodes
> >
> > Amazon Cloud Diamond
> > Code Case Cores ST HT ST HT
> >
> > Lammps T160 32 34006 29080 40668 28541 Lammps au 32 77497 80588
> > 71012
> > 75844
> >
> >
> > Tom Oppe
> >
> >
> >
> > -----Original Message-----
> > From: Leach, Carrie L Mrs CTR USA USACE USA
> > [mailto:carrie.l.leach2.ctr_at_us.army.mil](javascript:main.compose()(j
> > avascript:main.compose()
> > Sent: Sunday, November 25, 2012 10:02 PM
> > To: Oppe, Thomas C ERDC-RDE-ITL-MS Contractor
> > Subject: RE: Status on Amazon Cloud benchmarking (UNCLASSIFIED)
> >
> > UNCLASSIFIED
> > This may be a stupid question, but do you think there's an env var somewhere that (foolishly) allows the user to control which cpu</x> type -xHOST points into the compilation?
> >
> >
> >
> > On 12.11.25, "Oppe, Thomas C ERDC-RDE-ITL-MS Contractor" wrote:
> > > Classification: UNCLASSIFIED
> > > Caveats: FOUO
> > >
> > > Carrie,
> > >
> > > I will try the mpd instructions you gave below. I have not tried them yet. I'm waiting for a LAMMPS job to end, which should be soon.
> > >
> > > "-xHOST" seems to generate the same executable as "-xSSE4.2", which is the Nehalem chip that we have in Diamond. But the "cc2.8xlarge" are SandyBridge (see http://en.wikipedia</x></x>.org/wiki/Sandybridge ), so I don't understand why we are not generating AVX instructions.
> > >
> > > root_at_master > cpu</x>info
> > > Intel(R) Processor information utility, Version 4.1.0 Build
> > > 20120831 Copyright (C) 2005-2012 Intel Corporation. All rights reserved.
> > >
> > > ===== Processor composition =====
> > > Processor name : Intel(R) Xeon(R) E5-2670 0
> > > Packages(sockets) : 2
> > > Cores : 16
> > > Processors(CPUs) : 32
> > > Cores per package : 8
> > > Threads per core : 2
> > >
> > > Tom Oppe
> > >
> > > -----Original Message-----
> > > From: Leach, Carrie L Mrs CTR USA USACE USA
> > > [mailto:carrie.l.leach2.ctr_at_us.army.mil](javascript:main.compose()
> > > (javascript:main.compose()(j
> > > avascript:main.compose()
> > > Sent: Sunday, November 25, 2012 7:38 PM
> > > To: Oppe, Thomas C ERDC-RDE-ITL-MS Contractor
> > > Subject: RE: Status on Amazon Cloud benchmarking (UNCLASSIFIED)
> > >
> > > UNCLASSIFIED
> > > Thanks. Sorry if the mpd stuff wasn't helpful. We're back from my inlaws. I'm putting boys in bed, so around 8 I can look at anything you'd like me to.
> > > -Carrie
> > >
> > > On 12.11.25, "Oppe, Thomas C ERDC-RDE-ITL-MS Contractor" wrote:
> > > > Classification: UNCLASSIFIED
> > > > Caveats: FOUO
> > > >
> > > > Carrie,
> > > >
> > > > Thank you for all the notes on "mpd". I'll try to get it going. If I fail, I might have to do a reinstall or pass the baton to you. As of last night, I could get an Intel MPI executable to run on "master", but not on "master" and "node001".
> > > >
> > > > My SGE script was:
> > > >
> > > > #!/bin/bash
> > > > #
> > > > #$ -cwd
> > > > #$ -l h_rt=6:00:00
> > > > #$ -pe orte 64
> > > > #$ -j y
> > > > #$ -N job
> > > > #$ -V
> > > >
> > > > mpdboot</x> -v -r ssh -n 2 -f /sharedWork</x>/mpd.hosts cd
> > > > /sharedWork</x>/ABTP/app/lammps/util
> > > > time mpiexec -ppn 16 -n 32 ./a.out exit
> > > >
> > > > The executable "a.out" was formed by running "mpiifort" on "/sharedWork</x>/ABTP/app/lammps/util/ABTP_timer.f"
> > > >
> > > > The output showed that all 32 MPI processes were running on "master" instead of 16 on "master" and 16 on "node001".
> > > >
> > > > Here are my notes on doing the Intel compiler installation in case you want to try it. Just let me know when you want to try it, and I'll kill my running LAMMPS jobs. It's actually almost completely automated.
> > > >
> > > > (1) You will need to have the "expect" utility installed beforehand. Note that we have the "expect5.45.tar" file in our work directory EBS volume "/sharedWork</x>". The tar file has been untarred in directory "/sharedWork</x>/expect5.45" and compiled. Inside this directory are the results of the build:
> > > >
> > > > expect -- executable
> > > > libexpect5.45.so -- shared object library
> > > >
> > > > When making expect, do: configure make make test make install
> > > >
> > > > I think the shared object library will appear in the directory
> > > >
> > > > /usr/lib/expect5.45/libexpect5.45.so
> > > >
> > > > and it needs to be copied up a directory so "expect" can find it:
> > > >
> > > > cd /usr/lib/expect5.45
> > > > cp libexpect5.45.so ..
> > > >
> > > > (2) Create a "machines.LINUX" file in directory "/sharedWork</x>/l_ics_2013.0.028" with the hostnames of the nodes in the cluster, in our case:
> > > >
> > > > master
> > > > node001
> > > >
> > > > The last time I tried this, I need to supply a password for "node001" when I tried to run "sshconnectivity.exp machines.LINUX " in step 7 of "Readme.txt". I didn't know any password, so I eliminated "node001" from "machines.LINUX" and the rest of the installation went smoothly. However, now I can't figure out how to run an MPI code across both nodes, so maybe I need to include "node001" after all.
> > > >
> > > > (3) Our installation directory will be "/sharedWork</x>/intel/" so put the license file in directory "/sharedWork</x>/intel/licenses". Our license file is thus:
> > > >
> > > > /sharedWork</x>/intel/licenses/EVAL_L___V8VP-CVBJCHC3.lic
> > > >
> > > > Thus do:
> > > >
> > > > export INTEL_LICENSE_FILE=/sharedWork</x>/intel/licenses
> > > >
> > > > (4) We are at step 7 of the "Readme.txt" document, so time to do:
> > > >
> > > > ./sshconnectivity.exp machines.LINUX
> > > >
> > > > I still get that same error message about not being able to connect to "node001".
> > > >
> > > > root_at_master > ssh -n node001 ls -aC ~/.ssh Permission denied
> > > > (publickey,gssapi-with-mic).
> > > > Error - User password entry generated permission denied message on node "node001".
> > > > rm -rf ~/.ssh/authorized_keys.exp8.root root_at_master > rm -rf
> > > > ~/.ssh/authorized_keys.exp8.root root_at_master > Node count = 2
> > > > Success count = 1 Error - Secure shell connectivity was not
> > > > established on all nodes.
> > > > For details, look for the text "Error" in the log output listing "/tmp/sshconnectivity.root.log".
> > > > Version number: $Revision: 259 $ Version date: $Date: 2012-06-11
> > > > 23:26:12 +0400 (Mon, 11 Jun 2012) $
> > > >
> > > >
> > > > After removing "node001" from "machines.LINUX", the "sshconnectivity" step runs without error. But will it be a worthwhile installation or just run on master?
> > > >
> > > > (5) We are now at step 8 of "Readme.txt". We have a working "java"
> > > >
> > > > root_at_master > which java
> > > > /usr/bin/java
> > > >
> > > > Thus we are ready to do the installation with:
> > > >
> > > > ./install.sh
> > > >
> > > > This is a long process and you will be prompted to answer questions along the way. The only non-default answer I gave was the installation directory:
> > > >
> > > > /sharedWork</x>/intel
> > > >
> > > > instead of the default installation directory "/opt/intel". Note that the first time around, I tried to use "/opt/intel" as the installation directory, but there is not enough disk space left on /opt to do the install:
> > > >
> > > > root_at_master > df -h /opt
> > > > Filesystem Size Used Avail Use% Mounted on
> > > > /dev/mapper/VolGroup00-LogVol00
> > > > 9.9G 8.1G 1.4G 87% /
> > > >
> > > > At the end of the installation, I added some lines to our ".bashrc" file, for which a copy is maintained in /sharedWork</x>/.bashrc.
> > > >
> > > > # Intel compiler environment variables:
> > > >
> > > > export PATH=".:/sharedWork</x>/intel/bin:/sharedWork</x>/intel/impi/4.1.0.024/bin64:${HOME}/bin:${PATH}"
> > > >
> > > > . /sharedWork</x>/intel/bin/compilervars.sh intel64 .
> > > > /sharedWork</x>/intel/bin/iccvars.sh intel64 .
> > > > /sharedWork</x>/intel/bin/ifortvars.sh intel64
> > > >
> > > > export INTEL_LICENSE_FILE=/sharedWork</x>/intel/licenses
> > > >
> > > > That's about all I know about installing the Intel compilers.
> > > >
> > > > ############################################################
> > > >
> > > >
> > > > On other topics:
> > > >
> > > > Here is StarCluster's documentation on SGE:
> > > >
> > > > http://star.mit.edu/cluster/docs/latest/guides/sge.html
> > > >
> > > > The default process-to-core mapping can be changed from "round-robin" to "fill-up" (consecutive placement) by:
> > > >
> > > > root_at_master > qconf -mp orte
> > > >
> > > > You're in "vi" now, so change the line:
> > > >
> > > > allocation_rule $round_robin
> > > >
> > > > to
> > > >
> > > > allocation_rule $fill_up
> > > >
> > > > Exit "vi" and verify the change:
> > > >
> > > > root_at_master > qconf -sp orte
> > > > pe_name orte
> > > > slots 64
> > > > user_lists NONE
> > > > xuser_lists NONE
> > > > start_proc_args /bin/true
> > > > stop_proc_args /bin/true
> > > > allocation_rule $fill_up
> > > > control_slaves TRUE
> > > > job_is_first_task FALSE
> > > > urgency_slots min
> > > > accounting_summary FALSE
> > > >
> > > > I have made this change for the "myCluster" cluster, since normally we want a consecutive rather than a round-robin placement strategy.
> > > >
> > > > Another topic: For the Intel compilers, what should we use for the architecture flag, "-x<arch>" ?
> > > >
> > > > -xAVX does not work, program won't run
> > > >
> > > > -xCORE-AVX2 does not work, program won't run
> > > >
> > > > -xCORE-AVX-I does not work, program won't run
> > > >
> > > > These seem to be the only three choices that involve "AVX" instructions, so how do we get AVX instructions? "-xHost" works, but what are we getting? I have tried all the "verbose" compilation flags I can think of for Intel, and there is no hint of AVX instructions:
> > > >
> > > > ifort -V (upper case V)
> > > > shows version numbers of compiler and linker
> > > >
> > > > ifort -v (lower case v)
> > > > shows options used by compiler and linker
> > > >
> > > > ifort -watch
> > > > shows verbose compiler and link steps
> > > >
> > > > ifort -watch all
> > > > same as using just "-watch"
> > > >
> > > > ifort -dryrun
> > > > same as using "-watch" but no object files or executable is
> > > > produced
> > > >
> > > > I hope we are getting a full implementation of the Intel compiler suite with our 30-day trial license. Getting access to the AVX instructions is a major reason for using the Intel compilers.
> > > >
> > > > Tom Oppe
> > > >
> > > > -----Original Message-----
> > > > From: Leach, Carrie L Mrs CTR USA USACE USA
> > > > [mailto:carrie.l.leach2.ctr_at_us.army.mil](javascript:main.compose
> > > > ()(javascript:main.compose()
> > > > (javascript:main.compose()(j
> > > > avascript:main.compose()
> > > > Sent: Sunday, November 25, 2012 3:32 PM
> > > > To: Oppe, Thomas C ERDC-RDE-ITL-MS Contractor
> > > > Subject: RE: Status on Amazon Cloud benchmarking (UNCLASSIFIED)
> > > >
> > > > UNCLASSIFIED
> > > > From my notes, it looks like I was trying to start the mpd daemon last go around and then Mark did it. All I could find from him was a small mention of it from 2012.02.06, "BTW, please let me know if you experience trouble with this. I believe I've brought up the mpd properly; however, I could be sadly mistaken."
> > > > I will search his document to see if I can find where he describes doing it. Otherwise we'll have to ask him or just figure it out.
> > > >
> > > > The following are my notes on setting up the mpd ring, but it is for mpich</x>, not intel mpi.
> > > >
> > > >
> > > > P. 5 from mpich</x> installguide:
> > > >
> > > > The first sanity check consists of bringing up a ring of one MPD
> > > > on
> > > >
> > > > the local machine, testing one MPD command, and bringing the "ring" down.
> > > >
> > > > mpd &
> > > >
> > > > mpdtrace
> > > >
> > > > mpdallexit
> > > >
> > > > [mahood_at_ip-10-17-210-179 ~]$ mpdtrace
> > > >
> > > > ip-10-17-210-179
> > > >
> > > > [mahood_at_ip-10-17-210-179 ~]$ mpdallexit
> > > >
> > > > The next sanity check is to run a non-MPI program using the daemon.
> > > >
> > > > mpd &
> > > >
> > > > mpiexec -n 1 /bin/hostname
> > > >
> > > > mpdallexit
> > > >
> > > > This should print the name of the machine you are running on. If
> > > > not,
> > > >
> > > > you should check Appendix A on troubleshooting MPD.
> > > >
> > > > [mahood_at_ip-10-17-210-179 ~]$ mpiexec -n 1 /bin/hostname
> > > >
> > > > ip-10-17-210-179
> > > >
> > > > (note: hostid command gives 110ab3d2)
> > > >
> > > > step 12: Now we will bring up a ring of mpd's on a set of machines.
> > > > Create a
> > > >
> > > > file consisting of a list of machine names, one per line. Name
> > > > this file
> > > >
> > > > mpd.hosts. These hostnames will be used as targets for ssh or
> > > > rsh,
> > > >
> > > > so include full domain names if necessary. Check that you can
> > > > reach
> > > >
> > > > these machines with ssh or rsh without entering a password. You
> > > > can
> > > >
> > > > test by doing
> > > >
> > > > ssh othermachine date
> > > >
> > > >
> > > > On 12.11.24, "Oppe, Thomas C ERDC-RDE-ITL-MS Contractor" wrote:
> > > > > Classification: UNCLASSIFIED
> > > > > Caveats: FOUO
> > > > >
> > > > > Carrie,
> > > > >
> > > > > When I try to run a simple "Hello World" type program, it says:
> > > > >
> > > > > mpdroot: cannot connect to local mpd at:
> > > > > /tmp/24.1.all.q/mpd2.console_master_root
> > > > > probable cause: no mpd daemon on this machine possible cause:
> > > > > unix socket /tmp/24.1.all.q/mpd2.console_master_root has been
> > > > > removed mpiexec_master (__init__ 1524): forked process failed;
> > > > > status=255
> > > > >
> > > > > Didn't you and Mark have to set up an MPD daemon when running MPICH2?
> > > > >
> > > > > We are now running Intel MPI (not OpenMPI), so the HYCOM scripts will need to be modified since IMPI won't recognize all those OpenMPI environment variables.
> > > > >
> > > > > Also, I think "mpirun" needs to be replaced by "mpiexec" or perhaps "mpiexec.hydra".
> > > > >
> > > > > Tom Oppe
> > > > >
> > > > >
> > > > > -----Original Message-----
> > > > > From: Leach, Carrie L Mrs CTR USA USACE USA
> > > > > [mailto:carrie.l.leach2.ctr_at_us.army.mil](javascript:main.compo
> > > > > se()(javascript:main.compose
> > > > > ()(javascript:main.compose()
> > > > > (javascript:main.compose()(j
> > > > > avascript:main.compose()
> > > > > Sent: Saturday, November 24, 2012 9:33 PM
> > > > > To: Oppe, Thomas C ERDC-RDE-ITL-MS Contractor
> > > > > Cc: William Ward
> > > > > Subject: RE: Status on Amazon Cloud benchmarking
> > > > > (UNCLASSIFIED)
> > > > >
> > > > > UNCLASSIFIED
> > > > > Re intel compilers: I think I have actually found the user guide. I had to dig through Intel's webpage, finally found the release notes, and when I downloaded them they had a chapter titled, "Installation and Uninstalling on Linux OS". I'm about to read through it to see if it's helpful. I'm also attaching it in case you are interested.
> > > > > Re LAMMPS runs: Are you using the older gnu compilers? Do you want me to try HYCOM the same way? I don't want to run on top of you.
> > > > > This is getting more fun by the minute.
> > > > > -Carrie
> > > > >
> > > > > On 12.11.23, "Oppe, Thomas C ERDC-RDE-ITL-MS Contractor" wrote:
> > > > > > Classification: UNCLASSIFIED
> > > > > > Caveats: FOUO
> > > > > >
> > > > > > Carrie,
> > > > > >
> > > > > > I never figured out what the "ictvars.sh" file is. Is it a file we should have before beginning the installation or is it produced during the installation? From the "Readme.txt" file, it says:
> > > > > >
> > > > > > The "User Checklist for Linux* OS" topic of the Installation
> > > > > > Guide provides details on setting up ictvars.sh within Bourne* Shell.
> > > > > >
> > > > > > Now if I can only find that "Installation Guide" file. I will look into it some more.
> > > > > >
> > > > > > My single-threaded (ST) and hyper-threaded (HT) LAMMPS runs produced contradictory results: The HT run was faster than the ST run for the "T160" test case, but slower than the ST run for the "au" test case. I'm rerunning all four benchmarks on the EC2 and the ST "au" benchmark on Diamond.
> > > > > >
> > > > > > ST = 1 MPI proc/core, so 32 procs total on two EC2 nodes HT
> > > > > > =
> > > > > > 2 MPI procs/core, so 64 procs total on two EC2 nodes
> > > > > >
> > > > > > Amazon Cloud Diamond
> > > > > > Code Case Cores ST HT ST HT
> > > > > >
> > > > > > Lammps T160 32 34006 29080 40668 28541 Lammps au 32 77537 80588 ???
> > > > > > 75844
> > > > > >
> > > > > > To keep jobs from running at the same time on the same
> > > > > > nodes, ask for 32*(number of nodes needed) cores in your SGE scripts.
> > > > > > Thus use in the template file,
> > > > > >
> > > > > > #$ -pe orte XYZASK
> > > > > >
> > > > > > and when "genall" asks for the number of cores per node (a hardware question), answer 32 instead of 16. SGE apparently will try to schedule 32 processes to an node since there are 32 logical cores per node in hyper-threaded mode even though there are only 16 physical cores.
> > > > > >
> > > > > > Tom Oppe
> > > > > >
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: Leach, Carrie L Mrs CTR USA USACE USA
> > > > > > [mailto:carrie.l.leach2.ctr_at_us.army.mil](javascript:main.com
> > > > > > pose()(javascript:main.compo se()(javascript:main.compose
> > > > > > ()(javascript:main.compose() (javascript:main.compose()(j
> > > > > > avascript:main.compose()
> > > > > > Sent: Thursday, November 22, 2012 8:00 AM
> > > > > > To: Oppe, Thomas C ERDC-RDE-ITL-MS Contractor
> > > > > > Cc: William Ward
> > > > > > Subject: RE: Status on Amazon Cloud benchmarking
> > > > > > (UNCLASSIFIED)
> > > > > >
> > > > > > UNCLASSIFIED
> > > > > > Tom,
> > > > > > Good morning. I wrote you an email last night, but I kept
> > > > > > falling asleep while writing it...this morning I received
> > > > > > the email, addressed to me instead of you. Who does that? :)
> > > > > >
> > > > > > I don't mind trying to install the intel compilers. You had gotten farther than I did on that installation. I stopped when I could not find that ictvars.sh file. Is the license file still at /opt/intel/licenses/?
> > > > > >
> > > > > > I'm so glad you got something running. We had the same trouble before with jobs running on top of each other. With HYCOM, we had to run one job at a time.
> > > > > >
> > > > > > Thanks. I hope you have a wonderful Thanksgiving.
> > > > > > -Carrie
> > > > > >
> > > > > > On 12.11.22, "Oppe, Thomas C ERDC-RDE-ITL-MS Contractor" wrote:
> > > > > > > Classification: UNCLASSIFIED
> > > > > > > Caveats: FOUO
> > > > > > >
> > > > > > > Carrie,
> > > > > > >
> > > > > > > If you'd like to try installing the Intel compilers, I will try installing the gcc/g++/gfortran 4.7.2 compilers.
> > > > > > >
> > > > > > > Or if you want to try GNU, I can try Intel. Let me know which compiler you want to try installing. Either installation will be a job, I'm afraid.
> > > > > > >
> > > > > > > Another nasty surprise with SGE: It will allow jobs to run on the same nodes concurrently. I had queued two 32-process, 2-node, 16 processes per node jobs. Since hyper-threading is enabled, there are 16 physical cores but 32 logical cores per node. Since my jobs each required only 16 cores per node (master + node001), the scheduler started them at the same time, running 16 procs of each job on master and similarly for node001. Of course, they were competing with each other since there are only 16 physical cores per node.
> > > > > > >
> > > > > > > A web page claims that you can request exclusive use of
> > > > > > > the nodes with
> > > > > > >
> > > > > > > qsub -l exclusive=true job
> > > > > > >
> > > > > > > or, abbreviated,
> > > > > > >
> > > > > > > qsub -l excl=true job
> > > > > > >
> > > > > > > which I put in the script itself as:
> > > > > > >
> > > > > > > #$ -l excl=true
> > > > > > >
> > > > > > > But SGE said that it did not recognize the "exclusive" or "excl" keywords, and it made the same complaint when I used the qsub commands above.
> > > > > > >
> > > > > > > My only workaround at this stage is for each job to request both nodes in hyperthreaded mode:
> > > > > > >
> > > > > > > #$ -pe orte 64
> > > > > > >
> > > > > > > mpirun -np 32 -npernode 16 ./a.out
> > > > > > >
> > > > > > >
> > > > > > > Our first data points, but using the g++ 4.4.0 compiler, are:
> > > > > > >
> > > > > > > Code Case Cores Procs Time
> > > > > > >
> > > > > > > lammps T160 32 64 29080
> > > > > > > lammps au 32 64 80588
> > > > > > >
> > > > > > > On Diamond, lammps_T160 running hyper-threaded on 32 cores (64 procs) takes 28541 seconds, very close to what was obtained above.
> > > > > > >
> > > > > > > On Diamond, lammps_au running hyper-threaded on 128 cores (256 procs) takes 22736 seconds. Assuming linear scaling, on 32 cores (64 procs), it would take 4*22736 = 90,944 seconds.
> > > > > > >
> > > > > > > There are many caveats: these are 2-node jobs, LAMMPS does not use a lot of memory, is very load unbalanced, GNU 4.4.0 does not have AVX instructions, etc.
> > > > > > >
> > > > > > > Tom Oppe
> > > > > > >
> > > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Leach, Carrie L Mrs CTR USA USACE USA
> > > > > > > [mailto:carrie.l.leach2.ctr_at_us.army.mil](javascript:main.c
> > > > > > > ompose()(javascript:main.com pose()(javascript:main.compo
> > > > > > > se()(javascript:main.compose
> > > > > > > ()(javascript:main.compose() (javascript:main.compose()(j
> > > > > > > avascript:main.compose()
> > > > > > > Sent: Wednesday, November 21, 2012 5:52 AM
> > > > > > > To: Oppe, Thomas C ERDC-RDE-ITL-MS Contractor
> > > > > > > Subject: Re: Status on Amazon Cloud benchmarking
> > > > > > > (UNCLASSIFIED)
> > > > > > >
> > > > > > > UNCLASSIFIED
> > > > > > > Ok, thank you. Which compiler suite would you like me to look at?
> > > > > > > -Carrie
> > > > > > >
> > > > > > > On 12.11.21, "Oppe, Thomas C ERDC-RDE-ITL-MS Contractor" wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Classification: UNCLASSIFIED
> > > > > > > > Caveats: FOUO
> > > > > > > >
> > > > > > > >
> > > > > > > > Carrie,
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > I have four jobs queued on the Amazon Cloud cluster, one of which is running. All four jobs are scheduled to run on two 16-core SandyBridge instances, thus 32 physical cores. These are just testing runs to see if OpenMPI and the GNU compilers are generating correct executables and if hyper-threading is turned on as Amazon claims.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > LAMMPS “au” is running on 32 cores. One job is a 32-proc single-threaded run. The other job (now running) is a 64-proc hyper-threaded job (i.e., 32 procs assigned to each 16-core node).
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > There are another two jobs to run the LAMMPS T160 test case under the same circumstances.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > It is hard to tell how long a job has been running. SGE does not give you this information. By the way, neither does LoadLeveler on the IBM PERCS platform.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Tasks to do over the holidays:
> > > > > > > >
> > > > > > > > (1) Install the GNU gfortran/gcc/g++ version 4.7.2 compilers which have the AVX instruction set. The above runs are being done with executables generated by g++ 4.4.0, which does not have AVX. The 4.7.2 distribution is on /sharedWork</x>. We can run gnu-compiled codes with either OpenMPI or MPICH</x></x>2.
> > > > > > > >
> > > > > > > > (2) Install the Intel compiler suite. With Intel, we will be running IMPI (Intel MPI, based on MPICH</x></x>2). Intel, of course, has the AVX instructions.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > If one or both of these tasks is done, we can start our benchmarking runs of HYCOM and GAMESS at low process counts, and for me to run HPCC at presumably a high process count.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > When we install the compilers, we have to make sure that all files (executables, libraries, source code, everything) stays in /sharedWork</x>, since that is the only permanent disk space that we have. If we put any compiler executables on the master node’s file systems (/bin, /usr/bin, /usr/local/bin, or somewhere in $HOME), then we risk losing that work if the master node crashes.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Tom Oppe
> > > > > > > >
> > > > > > > >
> > > > > > > > Classification: UNCLASSIFIED
> > > > > > > > Caveats: FOUO
> > > > > > > UNCLASSIFIED
> > > > > > >
> > > > > > > Classification: UNCLASSIFIED
> > > > > > > Caveats: FOUO
> > > > > > UNCLASSIFIED
> > > > > >
> > > > > > Classification: UNCLASSIFIED
> > > > > > Caveats: FOUO
> > > > > UNCLASSIFIED
> > > > >
> > > > > Classification: UNCLASSIFIED
> > > > > Caveats: FOUO
> > > > UNCLASSIFIED
> > > >
> > > > Classification: UNCLASSIFIED
> > > > Caveats: FOUO
> > > UNCLASSIFIED
> > >
> > > Classification: UNCLASSIFIED
> > > Caveats: FOUO
> > UNCLASSIFIED
> >
> > Classification: UNCLASSIFIED
> > Caveats: FOUO
> UNCLASSIFIED
>
> Classification: UNCLASSIFIED
> Caveats: FOUO
UNCLASSIFIED
Classification: UNCLASSIFIED
Caveats: FOUO
Classification: UNCLASSIFIED
Caveats: FOUO
attached mail follows:
- message/rfc822 attachment: stored
Received on Mon Nov 26 2012 - 21:59:03 EST