StarCluster - Mailing List Archive

Re: [Starcluster] SOLVED!! instance ssh problem...

From: Justin Riley <no email>
Date: Tue, 30 Mar 2010 15:30:47 -0400

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Nicholas,

I just fixed the issues you were having that required you to modify a
bunch of the code by hand (related to security group stuff). Could you
please undo your modifications and pull the latest github code? If
necessary, just remove the working directory and re-clone:

git clone http://github.com/jtriley/StarCluster.git

If you could test that this code gets you to the point of "Waiting for
cluster to start" on Eucalyptus, that'd be great. Please let me know if
you still need to do this:

> 1) File "build/bdist.macosx-10.6-universal/egg/starcluster/cluster.py",
> line 541, in __check_platform
> AttributeError: 'NoneType' object has no attribute 'architecture'
>
> I commented the line
>
> #image_platform = self.ec2.conn.get_image(image_id).architecture

However, beyond the "Waiting for cluster to start", I'm afraid I've now
reached as far as I can go with Eucalyptus and StarCluster due to the
fact that boto cannot report the private/public ip addresses of
eucalyptus instances (ie we get N/A's for ip addresses with $starcluster
listinstances).

There's really not any way to address this cleanly without breaking EC2
support that I can think of. The ip address info that I need gets put
into the dns_name and private_dns_name by Eucalyptus, however, using
this would be a serious hack and certainly make the code uglier.

As I said before, I'm not sure whether a more sophisticated dns setup
would get Eucalyptus to respond with these ip addresses or whether it's
not possible at all with Eucalyptus.

In either case, I think we need to do some investigation on this before
moving forward.

~Justin

On 03/29/2010 02:28 PM, Nicholas Ampazis wrote:
> Justin,
>
> I'm sorry, but I forgot to tell you that I've also made the following
> "manual" modifications before I was able to reach the point of my
> previous e-mail ('str' object has no attribute 'instances'" error)
>
>
> 1) File "build/bdist.macosx-10.6-universal/egg/starcluster/cluster.py",
> line 541, in __check_platform
> AttributeError: 'NoneType' object has no attribute 'architecture'
>
> I commented the line
>
> #image_platform = self.ec2.conn.get_image(image_id).architecture
>
> and added
>
> image_platform = "x86_64"
>
>
> 2) File "build/bdist.macosx-10.6-universal/egg/starcluster/awsutils.py",
> line 106, in get_or_create_group
> IndexError: list index out of range
>
> I commented out
>
> #sg = self.conn.get_all_security_groups(
> # groupnames=[name])[0]
>
> and added
>
> sg='default'
>
>
> 3) File "build/bdist.macosx-10.6-universal/egg/starcluster/cluster.py",
> line 323, in create_cluster
> AttributeError: 'str' object has no attribute 'name'
>
> I commented out the lines
>
> #master_sg = self.master_group.name
> #cluster_sg = self.cluster_group.name
>
> and added
>
> master_sg = 'default'
> cluster_sg = 'default'
>
>
> Thanks,
>
> Nicholas
>
> On Mon, Mar 29, 2010 at 11:11 AM, Justin Riley <jtriley_at_mit.edu> wrote:
> Nicholas,
>
> I've fixed the VOLUMES issue on github. I'll report back once I've
> checked out the other "AttributeError: 'str' object has no attribute
> 'instances'" error.
>
> Thanks,
>
> ~Justin
>
> On 03/29/2010 01:31 PM, Nicholas Ampazis wrote:
>>>> Justin,
>>>>
>>>> For some reason VOLUMES must be defined in the config file otherwise
>>>> "start" command ends with the following message:
>>>>
>>>> Traceback (most recent call last):
>>>> File "/usr/local/bin/starcluster", line 5, in <module>
>>>> pkg_resources.run_script('StarCluster==0.9999', 'starcluster')
>>>> File "/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/pkg_resources.py",
>>>> line 442, in run_script
>>>> self.require(requires)[0].run_script(script_name, ns)
>>>> File "/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/pkg_resources.py",
>>>> line 1167, in run_script
>>>> exec script_code in namespace, namespace
>>>> File "/Library/Python/2.6/site-packages/StarCluster-0.9999-py2.6.egg/EGG-INFO/scripts/starcluster",
>>>> line 6, in <module>
>>>>
>>>> File "build/bdist.macosx-10.6-universal/egg/starcluster/cli.py",
>>>> line 585, in main
>>>> File "build/bdist.macosx-10.6-universal/egg/starcluster/cli.py",
>>>> line 184, in execute
>>>> File "build/bdist.macosx-10.6-universal/egg/starcluster/cluster.py",
>>>> line 480, in is_valid
>>>> File "build/bdist.macosx-10.6-universal/egg/starcluster/cluster.py",
>>>> line 619, in _validate_ebs_settings
>>>> TypeError: 'NoneType' object is not iterable
>>>>
>>>>
>>>> I defined a VOLUME in the config and re-run the start command. This is
>>>> how far I went:
>>>>
>>>> starcluster start smallcluster test
>>>>
>>>>>>> Starting cluster...
>>>>>>> Launching a 2-node cluster...
>>>>>>> Launching master node...
>>>>>>> Master AMI: emi-A286143C
>>>> Reservation:r-4BC1082B
>>>>>>> Launching worker nodes...
>>>>>>> Node AMI: emi-A286143C
>>>> Reservation:r-458E0899
>>>>>>> Waiting for cluster to start...|Traceback (most recent call last):
>>>> File "/usr/local/bin/starcluster", line 5, in <module>
>>>> pkg_resources.run_script('StarCluster==0.9999', 'starcluster')
>>>> File "/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/pkg_resources.py",
>>>> line 442, in run_script
>>>> self.require(requires)[0].run_script(script_name, ns)
>>>> File "/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/pkg_resources.py",
>>>> line 1167, in run_script
>>>> exec script_code in namespace, namespace
>>>> File "/Library/Python/2.6/site-packages/StarCluster-0.9999-py2.6.egg/EGG-INFO/scripts/starcluster",
>>>> line 6, in <module>
>>>>
>>>> File "build/bdist.macosx-10.6-universal/egg/starcluster/cli.py",
>>>> line 585, in main
>>>> File "build/bdist.macosx-10.6-universal/egg/starcluster/cli.py",
>>>> line 186, in execute
>>>> File "build/bdist.macosx-10.6-universal/egg/starcluster/utils.py",
>>>> line 23, in wrapper
>>>> File "build/bdist.macosx-10.6-universal/egg/starcluster/cluster.py",
>>>> line 429, in start
>>>> File "build/bdist.macosx-10.6-universal/egg/starcluster/cluster.py",
>>>> line 374, in is_cluster_up
>>>> File "build/bdist.macosx-10.6-universal/egg/starcluster/cluster.py",
>>>> line 304, in running_nodes
>>>> File "build/bdist.macosx-10.6-universal/egg/starcluster/cluster.py",
>>>> line 272, in nodes
>>>> AttributeError: 'str' object has no attribute 'instances'
>>>>
>>>>
>>>> Thanks,
>>>>
>>>>
>>>> Nicholas
>>>>
>>>> On Mon, Mar 29, 2010 at 10:24 AM, Justin Riley <jtriley_at_mit.edu> wrote:
>>>> Hi Nicholas,
>>>>
>>>> Hmmm, this is interesting. The reason this is happening is because
>>>> zone.state is returning the controller's ip address rather than
>>>> 'available' as it does when using EC2. I just tested this with my local
>>>> Eucalyptus. It appears that Eucalyptus does not have support for
>>>> availability zone 'states' like Amazon EC2 does.
>>>>
>>>> So, I've relaxed the check for availability zone to simply print a
>>>> warning rather than erroring out if the zone state is not 'available'.
>>>> As long as StarCluster can retrieve the zone it allows the cluster
>>>> validation to proceed.
>>>>
>>>> You will see a warning about the availability zone when using Eucalyptus
>>>> although it should be safe to ignore.
>>>>
>>>> Please try again with latest dev code and report back.
>>>>
>>>> Thanks,
>>>>
>>>> ~Justin
>>>>
>>>> On 03/29/2010 12:08 PM, Nicholas Ampazis wrote:
>>>>>>> Justin,
>>>>>>>
>>>>>>> Thanks for the update. I've donwloaded the latest development version
>>>>>>> using git.
>>>>>>>
>>>>>>> This is what I got when I invoked "starcluster start smallcluster
>>>>>>> test" (there is a cluster template "smallcluster" defined in the
>>>>>>> configuration file):
>>>>>>>
>>>>>>>
>>>>>>> cluster.py:525 - ERROR - The AVAILABILITY_ZONE = %s is not available
>>>>>>> at this time
>>>>>>> Traceback (most recent call last):
>>>>>>> File "/usr/local/bin/starcluster", line 5, in <module>
>>>>>>> pkg_resources.run_script('StarCluster==0.9999', 'starcluster')
>>>>>>> File "/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/pkg_resources.py",
>>>>>>> line 442, in run_script
>>>>>>> self.require(requires)[0].run_script(script_name, ns)
>>>>>>> File "/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/pkg_resources.py",
>>>>>>> line 1167, in run_script
>>>>>>> exec script_code in namespace, namespace
>>>>>>> File "/Library/Python/2.6/site-packages/StarCluster-0.9999-py2.6.egg/EGG-INFO/scripts/starcluster",
>>>>>>> line 6, in <module>
>>>>>>>
>>>>>>> File "build/bdist.macosx-10.6-universal/egg/starcluster/cli.py",
>>>>>>> line 585, in main
>>>>>>> File "build/bdist.macosx-10.6-universal/egg/starcluster/cli.py",
>>>>>>> line 184, in execute
>>>>>>> File "build/bdist.macosx-10.6-universal/egg/starcluster/cluster.py",
>>>>>>> line 478, in is_valid
>>>>>>> File "build/bdist.macosx-10.6-universal/egg/starcluster/cluster.py",
>>>>>>> line 616, in _validate_ebs_settings
>>>>>>> TypeError: 'NoneType' object is not iterable
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Nicholas
>>>>>>>
>>>>>>> On Mon, Mar 29, 2010 at 8:55 AM, Justin Riley <jtriley_at_mit.edu> wrote:
>>>>>>> Hi Nicholas,
>>>>>>>
>>>>>>> Awesome, I'll try to wget that url on a local instance sometime today
>>>>>>> and see how it goes just to verify that this is the case (ie latest
>>>>>>> points to 1.0 by default on 1.6.2)
>>>>>>>
>>>>>>> I meant to send you an announcement this last night, but I made some
>>>>>>> modifications that should allow you to get past the credentials step
>>>>>>> when starting StarCluster on Eucalyptus. You'll need to pull in the
>>>>>>> latest code to test it out.
>>>>>>>
>>>>>>> Please let me know how things go and what the next obstacles are.
>>>>>>>
>>>>>>> I am aware of one obstacle that guarantees things will not quite run
>>>>>>> successfully. When we do:
>>>>>>>
>>>>>>> $ starcluster listinstances
>>>>>>>
>>>>>>> I noticed that both of our outputs of this command using Eucalyptus
>>>>>>> reports private_ip_address and ip_address as N/A. These variables are
>>>>>>> used by StarCluster to setup things like /etc/hosts, Sun Grid Engine, etc.
>>>>>>>
>>>>>>> I have a feeling this is due to needing a more sophisticated DNS setup
>>>>>>> with Eucalyptus but I haven't tried to solve this just yet. In any
>>>>>>> event, things will almost certainly not work until we can get these
>>>>>>> values to be properly populated (ie starcluster listinstances should
>>>>>>> show the ip addresses and not N/A's).
>>>>>>>
>>>>>>> Hope that helps,
>>>>>>>
>>>>>>> ~Justin
>>>>>>>
>>>>>>>
>>>>>>> On 03/29/2010 11:06 AM, Nicholas Ampazis wrote:
>>>>>>>>>> Justin,
>>>>>>>>>>
>>>>>>>>>> I do have a "add_key.pl" in the "/usr/share/eucalyptus" directory.
>>>>>>>>>>
>>>>>>>>>> However this might not be much relevant in the process of copying the
>>>>>>>>>> ssh key in later versions of Eucalyptus (i.e. 1.6.x), since I've
>>>>>>>>>> discovered that I could have achieved the same fix if I had
>>>>>>>>>> substituted
>>>>>>>>>>
>>>>>>>>>> public_key_url=http://169.254.169.254/1.0/meta-data/public-keys/0/openssh-key
>>>>>>>>>>
>>>>>>>>>> by
>>>>>>>>>>
>>>>>>>>>> public_key_url=http://169.254.169.254/latest/meta-data/public-keys/0/openssh-key
>>>>>>>>>>
>>>>>>>>>> (instead of public_key_url=http://169.254.169.254/2008-02-01/meta-data/public-keys/0/openssh-key)
>>>>>>>>>>
>>>>>>>>>> in /etc/init.d/ec2-get-credentials" of starcluster iso.
>>>>>>>>>>
>>>>>>>>>> Notice that in this case "latest" points to the same directory as
>>>>>>>>>> "api_ver" which in your eucalyptus installation (1.6.2) just happens
>>>>>>>>>> to be "1.0", so it works out of the box!
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Nicholas
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> P.S. Is there any progress in the starcluster git python code with
>>>>>>>>>> regards to commands that did not work with eucalyptus credentials
>>>>>>>>>> (e.g. starcluster start , etc)?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Mar 29, 2010 at 7:35 AM, Justin Riley <jtriley_at_mit.edu> wrote:
>>>>>>>>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>>>>>>>>> Hash: SHA1
>>>>>>>>>>>
>>>>>>>>>>> Hi Nicholas,
>>>>>>>>>>>
>>>>>>>>>>> Awesome, glad to hear you've got the StarCluster ami working with
>>>>>>>>>>> Eucalyptus. I'm still a little curious as to why I didn't need those
>>>>>>>>>>> modifications to /etc/init.d/ec2-get-credentials and you did.
>>>>>>>>>>>
>>>>>>>>>>> My current theory on this:
>>>>>>>>>>>
>>>>>>>>>>> I believe that Eucalyptus is running the script
>>>>>>>>>>> $EUCALYPTUS/usr/share/eucalyptus/add_key.pl somewhere in the process of
>>>>>>>>>>> bringing the instance up.
>>>>>>>>>>>
>>>>>>>>>>> Looking at this script it appears that they manually pipe the pub key
>>>>>>>>>>> into root's authorized_keys file (ie they're mounting the iso and
>>>>>>>>>>> creating the authorized_keys outside of the instance).
>>>>>>>>>>>
>>>>>>>>>>> My only guess as to why my EMI worked out of the box with respect to ssh
>>>>>>>>>>> is because of this script. Maybe it's not being executed for some reason?
>>>>>>>>>>>
>>>>>>>>>>> Can you check if that script exists for you in /usr/share/eucalyptus?
>>>>>>>>>>>
>>>>>>>>>>> Thanks and in any event, thanks for tracking this down :D
>>>>>>>>>>>
>>>>>>>>>>> ~Justin
>>>>>>>>>>>
>>>>>>>
>>>>>>>>
>>>>
>>>>>
>
>>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkuyUWcACgkQ4llAkMfDcrk8ygCeLTf+MMo8I2er7PAQ2RbIijvQ
64QAn0pojZ4Phyz2jqWvVzOOIJkQ1xFS
=Nh2d
-----END PGP SIGNATURE-----
Received on Tue Mar 30 2010 - 15:30:49 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject