StarCluster - Mailing List Archive

Re: [Starcluster] SOLVED!! instance ssh problem...

From: Justin Riley <no email>
Date: Wed, 31 Mar 2010 00:20:34 -0400

Hi Nicholas,

Hmmm, this is strange, I don't get the issue with the architecture attribute
using Eucalyptus. Could you verify that ec2.conn.get_image is returning None
by running the following:

$ ipython
~> from starcluster.config import get_config
~> cfg = get_config(); cfg.load()
~> ec2 = cfg.get_easy_ec2()
~> print ec2.conn.get_image('emi-A286143C')

Also, what does ec2.conn.get_all_images() return?

In any event, I've fixed the second error you encountered in github and this
should get you to the point that I was discussing in my last post: "waiting
for cluster to start".

The security groups will be created and the instances will be started,
however, the wait condition will never be met due to issues discussed in the
last post with regards to eucalyptus/boto and instance ip addresses in the
euca-describe-instances response.

I have an idea for how to get around this but I need to think a bit about how
to incorporate it without making a mess of the code. I'll let you know when I
have something worth testing in github.

~Justin

On Tuesday 30 March 2010 3:56:08 pm Nicholas Ampazis wrote:
> Justin,
>
> Unfortunately I still get the architecture error:
> > 1) File "build/bdist.macosx-10.6-universal/egg/starcluster/cluster.py",
> > line 541, in __check_platform
> > AttributeError: 'NoneType' object has no attribute 'architecture'
> >
> > I commented the line
> >
> > #image_platform = self.ec2.conn.get_image(image_id).architecture
>
> and I have to set 'x86_64' by hand.
>
> Past this point, this is how far I could get:
>
> cluster.py:526 - WARNING - The AVAILABILITY_ZONE = Zone:UEC-TMOD is
> not available at this time
>
> >>> Starting cluster...
> >>> Launching a 2-node cluster...
> >>> Launching master node...
> >>> Master AMI: emi-A286143C
> >>> Creating security group _at_sc-masters...
> >>> Creating security group _at_sc-test...
>
> Traceback (most recent call last):
> File "/usr/local/bin/starcluster", line 5, in <module>
> pkg_resources.run_script('StarCluster==0.9999', 'starcluster')
> File
> "/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/pytho
> n/pkg_resources.py", line 442, in run_script
> self.require(requires)[0].run_script(script_name, ns)
> File
> "/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/pytho
> n/pkg_resources.py", line 1167, in run_script
> exec script_code in namespace, namespace
> File
> "/Library/Python/2.6/site-packages/StarCluster-0.9999-py2.6.egg/EGG-INFO/s
> cripts/starcluster", line 6, in <module>
>
> File "build/bdist.macosx-10.6-universal/egg/starcluster/cli.py",
> line 588, in main
> File "build/bdist.macosx-10.6-universal/egg/starcluster/cli.py",
> line 187, in execute
> File "build/bdist.macosx-10.6-universal/egg/starcluster/utils.py",
> line 23, in wrapper
> File "build/bdist.macosx-10.6-universal/egg/starcluster/cluster.py",
> line 423, in start
> File "build/bdist.macosx-10.6-universal/egg/starcluster/cluster.py",
> line 324, in create_cluster
> File "build/bdist.macosx-10.6-universal/egg/starcluster/cluster.py",
> line 253, in cluster_group
> File "build/bdist.macosx-10.6-universal/egg/starcluster/awsutils.py",
> line 125, in get_or_create_group
> File "build/bdist.macosx-10.6-universal/egg/starcluster/awsutils.py",
> line 98, in create_group
> File "build/bdist.macosx-10.6-universal/egg/starcluster/awsutils.py",
> line 103, in get_group_or_none
> IndexError: list index out of range
>
>
> Please note that no instance of the "Master AMI: emi-A286143C"
> actually starts. However new security groups have been created as
> reported by "euca-describe-groups"
>
> euca-describe-groups
> GROUP admin _at_sc-masters StarCluster Master Nodes
> PERMISSION admin _at_sc-masters ALLOWS tcp 22 22 FROM CIDR 0.0.0.0/0
> GROUP admin _at_sc-test Cluster requested at 201003302244
> PERMISSION admin _at_sc-test ALLOWS tcp 22 22 FROM CIDR 0.0.0.0/0
> GROUP admin default default group
> PERMISSION admin default ALLOWS tcp 22 22 FROM CIDR 0.0.0.0/0
>
>
> Thanks,
>
> Nicholas
>
> On Tue, Mar 30, 2010 at 10:30 PM, Justin Riley <jtriley_at_mit.edu> wrote:
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> >
> > Hi Nicholas,
> >
> > I just fixed the issues you were having that required you to modify a
> > bunch of the code by hand (related to security group stuff). Could you
> > please undo your modifications and pull the latest github code? If
> > necessary, just remove the working directory and re-clone:
> >
> > git clone http://github.com/jtriley/StarCluster.git
> >
> > If you could test that this code gets you to the point of "Waiting for
> > cluster to start" on Eucalyptus, that'd be great. Please let me know if
> >
> > you still need to do this:
> >> 1) File "build/bdist.macosx-10.6-universal/egg/starcluster/cluster.py",
> >> line 541, in __check_platform
> >> AttributeError: 'NoneType' object has no attribute 'architecture'
> >>
> >> I commented the line
> >>
> >> #image_platform = self.ec2.conn.get_image(image_id).architecture
> >
> > However, beyond the "Waiting for cluster to start", I'm afraid I've now
> > reached as far as I can go with Eucalyptus and StarCluster due to the
> > fact that boto cannot report the private/public ip addresses of
> > eucalyptus instances (ie we get N/A's for ip addresses with $starcluster
> > listinstances).
> >
> > There's really not any way to address this cleanly without breaking EC2
> > support that I can think of. The ip address info that I need gets put
> > into the dns_name and private_dns_name by Eucalyptus, however, using
> > this would be a serious hack and certainly make the code uglier.
> >
> > As I said before, I'm not sure whether a more sophisticated dns setup
> > would get Eucalyptus to respond with these ip addresses or whether it's
> > not possible at all with Eucalyptus.
> >
> > In either case, I think we need to do some investigation on this before
> > moving forward.
> >
> > ~Justin
> >
> > On 03/29/2010 02:28 PM, Nicholas Ampazis wrote:
> >> Justin,
> >>
> >> I'm sorry, but I forgot to tell you that I've also made the following
> >> "manual" modifications before I was able to reach the point of my
> >> previous e-mail ('str' object has no attribute 'instances'" error)
> >>
> >>
> >> 1) File "build/bdist.macosx-10.6-universal/egg/starcluster/cluster.py",
> >> line 541, in __check_platform
> >> AttributeError: 'NoneType' object has no attribute 'architecture'
> >>
> >> I commented the line
> >>
> >> #image_platform = self.ec2.conn.get_image(image_id).architecture
> >>
> >> and added
> >>
> >> image_platform = "x86_64"
> >>
> >>
> >> 2) File "build/bdist.macosx-10.6-universal/egg/starcluster/awsutils.py",
> >> line 106, in get_or_create_group
> >> IndexError: list index out of range
> >>
> >> I commented out
> >>
> >> #sg = self.conn.get_all_security_groups(
> >> # groupnames=[name])[0]
> >>
> >> and added
> >>
> >> sg='default'
> >>
> >>
> >> 3) File "build/bdist.macosx-10.6-universal/egg/starcluster/cluster.py",
> >> line 323, in create_cluster
> >> AttributeError: 'str' object has no attribute 'name'
> >>
> >> I commented out the lines
> >>
> >> #master_sg = self.master_group.name
> >> #cluster_sg = self.cluster_group.name
> >>
> >> and added
> >>
> >> master_sg = 'default'
> >> cluster_sg = 'default'
> >>
> >>
> >> Thanks,
> >>
> >> Nicholas
> >>
> >> On Mon, Mar 29, 2010 at 11:11 AM, Justin Riley <jtriley_at_mit.edu> wrote:
> >> Nicholas,
> >>
> >> I've fixed the VOLUMES issue on github. I'll report back once I've
> >> checked out the other "AttributeError: 'str' object has no attribute
> >> 'instances'" error.
> >>
> >> Thanks,
> >>
> >> ~Justin
> >>
> >> On 03/29/2010 01:31 PM, Nicholas Ampazis wrote:
> >>>>> Justin,
> >>>>>
> >>>>> For some reason VOLUMES must be defined in the config file otherwise
> >>>>> "start" command ends with the following message:
> >>>>>
> >>>>> Traceback (most recent call last):
> >>>>> File "/usr/local/bin/starcluster", line 5, in <module>
> >>>>> pkg_resources.run_script('StarCluster==0.9999', 'starcluster')
> >>>>> File
> >>>>> "/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/
> >>>>>python/pkg_resources.py", line 442, in run_script
> >>>>> self.require(requires)[0].run_script(script_name, ns)
> >>>>> File
> >>>>> "/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/
> >>>>>python/pkg_resources.py", line 1167, in run_script
> >>>>> exec script_code in namespace, namespace
> >>>>> File
> >>>>> "/Library/Python/2.6/site-packages/StarCluster-0.9999-py2.6.egg/EGG-I
> >>>>>NFO/scripts/starcluster", line 6, in <module>
> >>>>>
> >>>>> File "build/bdist.macosx-10.6-universal/egg/starcluster/cli.py",
> >>>>> line 585, in main
> >>>>> File "build/bdist.macosx-10.6-universal/egg/starcluster/cli.py",
> >>>>> line 184, in execute
> >>>>> File
> >>>>> "build/bdist.macosx-10.6-universal/egg/starcluster/cluster.py", line
> >>>>> 480, in is_valid
> >>>>> File
> >>>>> "build/bdist.macosx-10.6-universal/egg/starcluster/cluster.py", line
> >>>>> 619, in _validate_ebs_settings
> >>>>> TypeError: 'NoneType' object is not iterable
> >>>>>
> >>>>>
> >>>>> I defined a VOLUME in the config and re-run the start command. This
> >>>>> is how far I went:
> >>>>>
> >>>>> starcluster start smallcluster test
> >>>>>
> >>>>>>>> Starting cluster...
> >>>>>>>> Launching a 2-node cluster...
> >>>>>>>> Launching master node...
> >>>>>>>> Master AMI: emi-A286143C
> >>>>>
> >>>>> Reservation:r-4BC1082B
> >>>>>
> >>>>>>>> Launching worker nodes...
> >>>>>>>> Node AMI: emi-A286143C
> >>>>>
> >>>>> Reservation:r-458E0899
> >>>>>
> >>>>>>>> Waiting for cluster to start...|Traceback (most recent call last):
> >>>>>
> >>>>> File "/usr/local/bin/starcluster", line 5, in <module>
> >>>>> pkg_resources.run_script('StarCluster==0.9999', 'starcluster')
> >>>>> File
> >>>>> "/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/
> >>>>>python/pkg_resources.py", line 442, in run_script
> >>>>> self.require(requires)[0].run_script(script_name, ns)
> >>>>> File
> >>>>> "/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/
> >>>>>python/pkg_resources.py", line 1167, in run_script
> >>>>> exec script_code in namespace, namespace
> >>>>> File
> >>>>> "/Library/Python/2.6/site-packages/StarCluster-0.9999-py2.6.egg/EGG-I
> >>>>>NFO/scripts/starcluster", line 6, in <module>
> >>>>>
> >>>>> File "build/bdist.macosx-10.6-universal/egg/starcluster/cli.py",
> >>>>> line 585, in main
> >>>>> File "build/bdist.macosx-10.6-universal/egg/starcluster/cli.py",
> >>>>> line 186, in execute
> >>>>> File "build/bdist.macosx-10.6-universal/egg/starcluster/utils.py",
> >>>>> line 23, in wrapper
> >>>>> File
> >>>>> "build/bdist.macosx-10.6-universal/egg/starcluster/cluster.py", line
> >>>>> 429, in start
> >>>>> File
> >>>>> "build/bdist.macosx-10.6-universal/egg/starcluster/cluster.py", line
> >>>>> 374, in is_cluster_up
> >>>>> File
> >>>>> "build/bdist.macosx-10.6-universal/egg/starcluster/cluster.py", line
> >>>>> 304, in running_nodes
> >>>>> File
> >>>>> "build/bdist.macosx-10.6-universal/egg/starcluster/cluster.py", line
> >>>>> 272, in nodes
> >>>>> AttributeError: 'str' object has no attribute 'instances'
> >>>>>
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>>
> >>>>> Nicholas
> >>>>>
> >>>>> On Mon, Mar 29, 2010 at 10:24 AM, Justin Riley <jtriley_at_mit.edu>
> >>>>> wrote: Hi Nicholas,
> >>>>>
> >>>>> Hmmm, this is interesting. The reason this is happening is because
> >>>>> zone.state is returning the controller's ip address rather than
> >>>>> 'available' as it does when using EC2. I just tested this with my
> >>>>> local Eucalyptus. It appears that Eucalyptus does not have support
> >>>>> for availability zone 'states' like Amazon EC2 does.
> >>>>>
> >>>>> So, I've relaxed the check for availability zone to simply print a
> >>>>> warning rather than erroring out if the zone state is not
> >>>>> 'available'. As long as StarCluster can retrieve the zone it allows
> >>>>> the cluster validation to proceed.
> >>>>>
> >>>>> You will see a warning about the availability zone when using
> >>>>> Eucalyptus although it should be safe to ignore.
> >>>>>
> >>>>> Please try again with latest dev code and report back.
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> ~Justin
> >>>>>
> >>>>> On 03/29/2010 12:08 PM, Nicholas Ampazis wrote:
> >>>>>>>> Justin,
> >>>>>>>>
> >>>>>>>> Thanks for the update. I've donwloaded the latest development
> >>>>>>>> version using git.
> >>>>>>>>
> >>>>>>>> This is what I got when I invoked "starcluster start smallcluster
> >>>>>>>> test" (there is a cluster template "smallcluster" defined in the
> >>>>>>>> configuration file):
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> cluster.py:525 - ERROR - The AVAILABILITY_ZONE = %s is not
> >>>>>>>> available at this time
> >>>>>>>> Traceback (most recent call last):
> >>>>>>>> File "/usr/local/bin/starcluster", line 5, in <module>
> >>>>>>>> pkg_resources.run_script('StarCluster==0.9999', 'starcluster')
> >>>>>>>> File
> >>>>>>>> "/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/l
> >>>>>>>>ib/python/pkg_resources.py", line 442, in run_script
> >>>>>>>> self.require(requires)[0].run_script(script_name, ns)
> >>>>>>>> File
> >>>>>>>> "/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/l
> >>>>>>>>ib/python/pkg_resources.py", line 1167, in run_script
> >>>>>>>> exec script_code in namespace, namespace
> >>>>>>>> File
> >>>>>>>> "/Library/Python/2.6/site-packages/StarCluster-0.9999-py2.6.egg/EG
> >>>>>>>>G-INFO/scripts/starcluster", line 6, in <module>
> >>>>>>>>
> >>>>>>>> File "build/bdist.macosx-10.6-universal/egg/starcluster/cli.py",
> >>>>>>>> line 585, in main
> >>>>>>>> File "build/bdist.macosx-10.6-universal/egg/starcluster/cli.py",
> >>>>>>>> line 184, in execute
> >>>>>>>> File
> >>>>>>>> "build/bdist.macosx-10.6-universal/egg/starcluster/cluster.py",
> >>>>>>>> line 478, in is_valid
> >>>>>>>> File
> >>>>>>>> "build/bdist.macosx-10.6-universal/egg/starcluster/cluster.py",
> >>>>>>>> line 616, in _validate_ebs_settings
> >>>>>>>> TypeError: 'NoneType' object is not iterable
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>>
> >>>>>>>> Nicholas
> >>>>>>>>
> >>>>>>>> On Mon, Mar 29, 2010 at 8:55 AM, Justin Riley <jtriley_at_mit.edu>
> >>>>>>>> wrote: Hi Nicholas,
> >>>>>>>>
> >>>>>>>> Awesome, I'll try to wget that url on a local instance sometime
> >>>>>>>> today and see how it goes just to verify that this is the case (ie
> >>>>>>>> latest points to 1.0 by default on 1.6.2)
> >>>>>>>>
> >>>>>>>> I meant to send you an announcement this last night, but I made
> >>>>>>>> some modifications that should allow you to get past the
> >>>>>>>> credentials step when starting StarCluster on Eucalyptus. You'll
> >>>>>>>> need to pull in the latest code to test it out.
> >>>>>>>>
> >>>>>>>> Please let me know how things go and what the next obstacles are.
> >>>>>>>>
> >>>>>>>> I am aware of one obstacle that guarantees things will not quite
> >>>>>>>> run successfully. When we do:
> >>>>>>>>
> >>>>>>>> $ starcluster listinstances
> >>>>>>>>
> >>>>>>>> I noticed that both of our outputs of this command using
> >>>>>>>> Eucalyptus reports private_ip_address and ip_address as N/A. These
> >>>>>>>> variables are used by StarCluster to setup things like /etc/hosts,
> >>>>>>>> Sun Grid Engine, etc.
> >>>>>>>>
> >>>>>>>> I have a feeling this is due to needing a more sophisticated DNS
> >>>>>>>> setup with Eucalyptus but I haven't tried to solve this just yet.
> >>>>>>>> In any event, things will almost certainly not work until we can
> >>>>>>>> get these values to be properly populated (ie starcluster
> >>>>>>>> listinstances should show the ip addresses and not N/A's).
> >>>>>>>>
> >>>>>>>> Hope that helps,
> >>>>>>>>
> >>>>>>>> ~Justin
> >>>>>>>>
> >>>>>>>> On 03/29/2010 11:06 AM, Nicholas Ampazis wrote:
> >>>>>>>>>>> Justin,
> >>>>>>>>>>>
> >>>>>>>>>>> I do have a "add_key.pl" in the "/usr/share/eucalyptus"
> >>>>>>>>>>> directory.
> >>>>>>>>>>>
> >>>>>>>>>>> However this might not be much relevant in the process of
> >>>>>>>>>>> copying the ssh key in later versions of Eucalyptus (i.e.
> >>>>>>>>>>> 1.6.x), since I've discovered that I could have achieved the
> >>>>>>>>>>> same fix if I had substituted
> >>>>>>>>>>>
> >>>>>>>>>>> public_key_url=http://169.254.169.254/1.0/meta-data/public-keys
> >>>>>>>>>>>/0/openssh-key
> >>>>>>>>>>>
> >>>>>>>>>>> by
> >>>>>>>>>>>
> >>>>>>>>>>> public_key_url=http://169.254.169.254/latest/meta-data/public-k
> >>>>>>>>>>>eys/0/openssh-key
> >>>>>>>>>>>
> >>>>>>>>>>> (instead of
> >>>>>>>>>>> public_key_url=http://169.254.169.254/2008-02-01/meta-data/pub
> >>>>>>>>>>>lic-keys/0/openssh-key)
> >>>>>>>>>>>
> >>>>>>>>>>> in /etc/init.d/ec2-get-credentials" of starcluster iso.
> >>>>>>>>>>>
> >>>>>>>>>>> Notice that in this case "latest" points to the same directory
> >>>>>>>>>>> as "api_ver" which in your eucalyptus installation (1.6.2) just
> >>>>>>>>>>> happens to be "1.0", so it works out of the box!
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks,
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Nicholas
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> P.S. Is there any progress in the starcluster git python code
> >>>>>>>>>>> with regards to commands that did not work with eucalyptus
> >>>>>>>>>>> credentials (e.g. starcluster start , etc)?
> >>>>>>>>>>>
> >>>>>>>>>>> On Mon, Mar 29, 2010 at 7:35 AM, Justin Riley <jtriley_at_mit.edu>
wrote:
> >>>>>>>>>>>> -----BEGIN PGP SIGNED MESSAGE-----
> >>>>>>>>>>>> Hash: SHA1
> >>>>>>>>>>>>
> >>>>>>>>>>>> Hi Nicholas,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Awesome, glad to hear you've got the StarCluster ami working
> >>>>>>>>>>>> with Eucalyptus. I'm still a little curious as to why I didn't
> >>>>>>>>>>>> need those modifications to /etc/init.d/ec2-get-credentials
> >>>>>>>>>>>> and you did.
> >>>>>>>>>>>>
> >>>>>>>>>>>> My current theory on this:
> >>>>>>>>>>>>
> >>>>>>>>>>>> I believe that Eucalyptus is running the script
> >>>>>>>>>>>> $EUCALYPTUS/usr/share/eucalyptus/add_key.pl somewhere in the
> >>>>>>>>>>>> process of bringing the instance up.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Looking at this script it appears that they manually pipe the
> >>>>>>>>>>>> pub key into root's authorized_keys file (ie they're mounting
> >>>>>>>>>>>> the iso and creating the authorized_keys outside of the
> >>>>>>>>>>>> instance).
> >>>>>>>>>>>>
> >>>>>>>>>>>> My only guess as to why my EMI worked out of the box with
> >>>>>>>>>>>> respect to ssh is because of this script. Maybe it's not being
> >>>>>>>>>>>> executed for some reason?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Can you check if that script exists for you in
> >>>>>>>>>>>> /usr/share/eucalyptus?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks and in any event, thanks for tracking this down :D
> >>>>>>>>>>>>
> >>>>>>>>>>>> ~Justin
> >
> > -----BEGIN PGP SIGNATURE-----
> > Version: GnuPG v2.0.14 (GNU/Linux)
> > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
> >
> > iEYEARECAAYFAkuyUWcACgkQ4llAkMfDcrk8ygCeLTf+MMo8I2er7PAQ2RbIijvQ
> > 64QAn0pojZ4Phyz2jqWvVzOOIJkQ1xFS
> > =Nh2d
> > -----END PGP SIGNATURE-----
>
Received on Wed Mar 31 2010 - 00:20:35 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject