StarCluster - Mailing List Archive

Re: [Starcluster] SOLVED!! instance ssh problem...

From: Nicholas Ampazis <no email>
Date: Tue, 30 Mar 2010 22:56:08 +0300

Justin,

Unfortunately I still get the architecture error:

> 1) File "build/bdist.macosx-10.6-universal/egg/starcluster/cluster.py",
> line 541, in __check_platform
> AttributeError: 'NoneType' object has no attribute 'architecture'
>
> I commented the line
>
> #image_platform = self.ec2.conn.get_image(image_id).architecture

and I have to set 'x86_64' by hand.

Past this point, this is how far I could get:

cluster.py:526 - WARNING - The AVAILABILITY_ZONE = Zone:UEC-TMOD is
not available at this time
>>> Starting cluster...
>>> Launching a 2-node cluster...
>>> Launching master node...
>>> Master AMI: emi-A286143C
>>> Creating security group _at_sc-masters...
>>> Creating security group _at_sc-test...
Traceback (most recent call last):
  File "/usr/local/bin/starcluster", line 5, in <module>
    pkg_resources.run_script('StarCluster==0.9999', 'starcluster')
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/pkg_resources.py",
line 442, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/pkg_resources.py",
line 1167, in run_script
    exec script_code in namespace, namespace
  File "/Library/Python/2.6/site-packages/StarCluster-0.9999-py2.6.egg/EGG-INFO/scripts/starcluster",
line 6, in <module>

  File "build/bdist.macosx-10.6-universal/egg/starcluster/cli.py",
line 588, in main
  File "build/bdist.macosx-10.6-universal/egg/starcluster/cli.py",
line 187, in execute
  File "build/bdist.macosx-10.6-universal/egg/starcluster/utils.py",
line 23, in wrapper
  File "build/bdist.macosx-10.6-universal/egg/starcluster/cluster.py",
line 423, in start
  File "build/bdist.macosx-10.6-universal/egg/starcluster/cluster.py",
line 324, in create_cluster
  File "build/bdist.macosx-10.6-universal/egg/starcluster/cluster.py",
line 253, in cluster_group
  File "build/bdist.macosx-10.6-universal/egg/starcluster/awsutils.py",
line 125, in get_or_create_group
  File "build/bdist.macosx-10.6-universal/egg/starcluster/awsutils.py",
line 98, in create_group
  File "build/bdist.macosx-10.6-universal/egg/starcluster/awsutils.py",
line 103, in get_group_or_none
IndexError: list index out of range


Please note that no instance of the "Master AMI: emi-A286143C"
actually starts. However new security groups have been created as
reported by "euca-describe-groups"

 euca-describe-groups
GROUP admin _at_sc-masters StarCluster Master Nodes
PERMISSION admin _at_sc-masters ALLOWS tcp 22 22 FROM CIDR 0.0.0.0/0
GROUP admin _at_sc-test Cluster requested at 201003302244
PERMISSION admin _at_sc-test ALLOWS tcp 22 22 FROM CIDR 0.0.0.0/0
GROUP admin default default group
PERMISSION admin default ALLOWS tcp 22 22 FROM CIDR 0.0.0.0/0


Thanks,

Nicholas

On Tue, Mar 30, 2010 at 10:30 PM, Justin Riley <jtriley_at_mit.edu> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi Nicholas,
>
> I just fixed the issues you were having that required you to modify a
> bunch of the code by hand (related to security group stuff). Could you
> please undo your modifications and pull the latest github code? If
> necessary, just remove the working directory and re-clone:
>
> git clone http://github.com/jtriley/StarCluster.git
>
> If you could test that this code gets you to the point of "Waiting for
> cluster to start" on Eucalyptus, that'd be great. Please let me know if
> you still need to do this:
>
>> 1) File "build/bdist.macosx-10.6-universal/egg/starcluster/cluster.py",
>> line 541, in __check_platform
>> AttributeError: 'NoneType' object has no attribute 'architecture'
>>
>> I commented the line
>>
>> #image_platform = self.ec2.conn.get_image(image_id).architecture
>
> However, beyond the "Waiting for cluster to start", I'm afraid I've now
> reached as far as I can go with Eucalyptus and StarCluster due to the
> fact that boto cannot report the private/public ip addresses of
> eucalyptus instances (ie we get N/A's for ip addresses with $starcluster
> listinstances).
>
> There's really not any way to address this cleanly without breaking EC2
> support that I can think of. The ip address info that I need gets put
> into the dns_name and private_dns_name by Eucalyptus, however, using
> this would be a serious hack and certainly make the code uglier.
>
> As I said before, I'm not sure whether a more sophisticated dns setup
> would get Eucalyptus to respond with these ip addresses or whether it's
> not possible at all with Eucalyptus.
>
> In either case, I think we need to do some investigation on this before
> moving forward.
>
> ~Justin
>
> On 03/29/2010 02:28 PM, Nicholas Ampazis wrote:
>> Justin,
>>
>> I'm sorry, but I forgot to tell you that I've also made the following
>> "manual" modifications before I was able to reach the point of my
>> previous e-mail ('str' object has no attribute 'instances'" error)
>>
>>
>> 1) File "build/bdist.macosx-10.6-universal/egg/starcluster/cluster.py",
>> line 541, in __check_platform
>> AttributeError: 'NoneType' object has no attribute 'architecture'
>>
>> I commented the line
>>
>> #image_platform = self.ec2.conn.get_image(image_id).architecture
>>
>> and added
>>
>> image_platform = "x86_64"
>>
>>
>> 2) File "build/bdist.macosx-10.6-universal/egg/starcluster/awsutils.py",
>> line 106, in get_or_create_group
>> IndexError: list index out of range
>>
>> I commented out
>>
>> #sg = self.conn.get_all_security_groups(
>> #                groupnames=[name])[0]
>>
>> and added
>>
>> sg='default'
>>
>>
>> 3) File "build/bdist.macosx-10.6-universal/egg/starcluster/cluster.py",
>> line 323, in create_cluster
>> AttributeError: 'str' object has no attribute 'name'
>>
>> I commented out the lines
>>
>> #master_sg = self.master_group.name
>> #cluster_sg = self.cluster_group.name
>>
>> and added
>>
>> master_sg = 'default'
>> cluster_sg = 'default'
>>
>>
>> Thanks,
>>
>> Nicholas
>>
>> On Mon, Mar 29, 2010 at 11:11 AM, Justin Riley <jtriley_at_mit.edu> wrote:
>> Nicholas,
>>
>> I've fixed the VOLUMES issue on github. I'll report back once I've
>> checked out the other "AttributeError: 'str' object has no attribute
>> 'instances'" error.
>>
>> Thanks,
>>
>> ~Justin
>>
>> On 03/29/2010 01:31 PM, Nicholas Ampazis wrote:
>>>>> Justin,
>>>>>
>>>>> For some reason VOLUMES must be defined in the config file otherwise
>>>>> "start" command ends with the following message:
>>>>>
>>>>> Traceback (most recent call last):
>>>>>   File "/usr/local/bin/starcluster", line 5, in <module>
>>>>>     pkg_resources.run_script('StarCluster==0.9999', 'starcluster')
>>>>>   File "/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/pkg_resources.py",
>>>>> line 442, in run_script
>>>>>     self.require(requires)[0].run_script(script_name, ns)
>>>>>   File "/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/pkg_resources.py",
>>>>> line 1167, in run_script
>>>>>     exec script_code in namespace, namespace
>>>>>   File "/Library/Python/2.6/site-packages/StarCluster-0.9999-py2.6.egg/EGG-INFO/scripts/starcluster",
>>>>> line 6, in <module>
>>>>>
>>>>>   File "build/bdist.macosx-10.6-universal/egg/starcluster/cli.py",
>>>>> line 585, in main
>>>>>   File "build/bdist.macosx-10.6-universal/egg/starcluster/cli.py",
>>>>> line 184, in execute
>>>>>   File "build/bdist.macosx-10.6-universal/egg/starcluster/cluster.py",
>>>>> line 480, in is_valid
>>>>>   File "build/bdist.macosx-10.6-universal/egg/starcluster/cluster.py",
>>>>> line 619, in _validate_ebs_settings
>>>>> TypeError: 'NoneType' object is not iterable
>>>>>
>>>>>
>>>>> I defined a VOLUME in the config and re-run the start command. This is
>>>>> how far I went:
>>>>>
>>>>> starcluster start smallcluster test
>>>>>
>>>>>>>> Starting cluster...
>>>>>>>> Launching a 2-node cluster...
>>>>>>>> Launching master node...
>>>>>>>> Master AMI: emi-A286143C
>>>>> Reservation:r-4BC1082B
>>>>>>>> Launching worker nodes...
>>>>>>>> Node AMI: emi-A286143C
>>>>> Reservation:r-458E0899
>>>>>>>> Waiting for cluster to start...|Traceback (most recent call last):
>>>>>   File "/usr/local/bin/starcluster", line 5, in <module>
>>>>>     pkg_resources.run_script('StarCluster==0.9999', 'starcluster')
>>>>>   File "/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/pkg_resources.py",
>>>>> line 442, in run_script
>>>>>     self.require(requires)[0].run_script(script_name, ns)
>>>>>   File "/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/pkg_resources.py",
>>>>> line 1167, in run_script
>>>>>     exec script_code in namespace, namespace
>>>>>   File "/Library/Python/2.6/site-packages/StarCluster-0.9999-py2.6.egg/EGG-INFO/scripts/starcluster",
>>>>> line 6, in <module>
>>>>>
>>>>>   File "build/bdist.macosx-10.6-universal/egg/starcluster/cli.py",
>>>>> line 585, in main
>>>>>   File "build/bdist.macosx-10.6-universal/egg/starcluster/cli.py",
>>>>> line 186, in execute
>>>>>   File "build/bdist.macosx-10.6-universal/egg/starcluster/utils.py",
>>>>> line 23, in wrapper
>>>>>   File "build/bdist.macosx-10.6-universal/egg/starcluster/cluster.py",
>>>>> line 429, in start
>>>>>   File "build/bdist.macosx-10.6-universal/egg/starcluster/cluster.py",
>>>>> line 374, in is_cluster_up
>>>>>   File "build/bdist.macosx-10.6-universal/egg/starcluster/cluster.py",
>>>>> line 304, in running_nodes
>>>>>   File "build/bdist.macosx-10.6-universal/egg/starcluster/cluster.py",
>>>>> line 272, in nodes
>>>>> AttributeError: 'str' object has no attribute 'instances'
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>>
>>>>> Nicholas
>>>>>
>>>>> On Mon, Mar 29, 2010 at 10:24 AM, Justin Riley <jtriley_at_mit.edu> wrote:
>>>>> Hi Nicholas,
>>>>>
>>>>> Hmmm, this is interesting. The reason this is happening is because
>>>>> zone.state is returning the controller's ip address rather than
>>>>> 'available' as it does when using EC2. I just tested this with my local
>>>>> Eucalyptus. It appears that Eucalyptus does not have support for
>>>>> availability zone 'states' like Amazon EC2 does.
>>>>>
>>>>> So, I've relaxed the check for availability zone to simply print a
>>>>> warning rather than erroring out if the zone state is not 'available'.
>>>>> As long as StarCluster can retrieve the zone it allows the cluster
>>>>> validation to proceed.
>>>>>
>>>>> You will see a warning about the availability zone when using Eucalyptus
>>>>> although it should be safe to ignore.
>>>>>
>>>>> Please try again with latest dev code and report back.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> ~Justin
>>>>>
>>>>> On 03/29/2010 12:08 PM, Nicholas Ampazis wrote:
>>>>>>>> Justin,
>>>>>>>>
>>>>>>>> Thanks for the update. I've donwloaded the latest development version
>>>>>>>> using git.
>>>>>>>>
>>>>>>>> This is what I got when I invoked "starcluster start smallcluster
>>>>>>>> test" (there is a cluster template "smallcluster" defined in the
>>>>>>>> configuration file):
>>>>>>>>
>>>>>>>>
>>>>>>>> cluster.py:525 - ERROR - The AVAILABILITY_ZONE = %s is not available
>>>>>>>> at this time
>>>>>>>> Traceback (most recent call last):
>>>>>>>>   File "/usr/local/bin/starcluster", line 5, in <module>
>>>>>>>>     pkg_resources.run_script('StarCluster==0.9999', 'starcluster')
>>>>>>>>   File "/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/pkg_resources.py",
>>>>>>>> line 442, in run_script
>>>>>>>>     self.require(requires)[0].run_script(script_name, ns)
>>>>>>>>   File "/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/pkg_resources.py",
>>>>>>>> line 1167, in run_script
>>>>>>>>     exec script_code in namespace, namespace
>>>>>>>>   File "/Library/Python/2.6/site-packages/StarCluster-0.9999-py2.6.egg/EGG-INFO/scripts/starcluster",
>>>>>>>> line 6, in <module>
>>>>>>>>
>>>>>>>>   File "build/bdist.macosx-10.6-universal/egg/starcluster/cli.py",
>>>>>>>> line 585, in main
>>>>>>>>   File "build/bdist.macosx-10.6-universal/egg/starcluster/cli.py",
>>>>>>>> line 184, in execute
>>>>>>>>   File "build/bdist.macosx-10.6-universal/egg/starcluster/cluster.py",
>>>>>>>> line 478, in is_valid
>>>>>>>>   File "build/bdist.macosx-10.6-universal/egg/starcluster/cluster.py",
>>>>>>>> line 616, in _validate_ebs_settings
>>>>>>>> TypeError: 'NoneType' object is not iterable
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Nicholas
>>>>>>>>
>>>>>>>> On Mon, Mar 29, 2010 at 8:55 AM, Justin Riley <jtriley_at_mit.edu> wrote:
>>>>>>>> Hi Nicholas,
>>>>>>>>
>>>>>>>> Awesome, I'll try to wget that url on a local instance sometime today
>>>>>>>> and see how it goes just to verify that this is the case (ie latest
>>>>>>>> points to 1.0 by default on 1.6.2)
>>>>>>>>
>>>>>>>> I meant to send you an announcement this last night, but I made some
>>>>>>>> modifications that should allow you to get past the credentials step
>>>>>>>> when starting StarCluster on Eucalyptus. You'll need to pull in the
>>>>>>>> latest code to test it out.
>>>>>>>>
>>>>>>>> Please let me know how things go and what the next obstacles are.
>>>>>>>>
>>>>>>>> I am aware of one obstacle that guarantees things will not quite run
>>>>>>>> successfully. When we do:
>>>>>>>>
>>>>>>>> $ starcluster listinstances
>>>>>>>>
>>>>>>>> I noticed that both of our outputs of this command using Eucalyptus
>>>>>>>> reports private_ip_address and ip_address as N/A. These variables are
>>>>>>>> used by StarCluster to setup things like /etc/hosts, Sun Grid Engine, etc.
>>>>>>>>
>>>>>>>> I have a feeling this is due to needing a more sophisticated DNS setup
>>>>>>>> with Eucalyptus but I haven't tried to solve this just yet. In any
>>>>>>>> event, things will almost certainly not work until we can get these
>>>>>>>> values to be properly populated (ie starcluster listinstances should
>>>>>>>> show the ip addresses and not N/A's).
>>>>>>>>
>>>>>>>> Hope that helps,
>>>>>>>>
>>>>>>>> ~Justin
>>>>>>>>
>>>>>>>>
>>>>>>>> On 03/29/2010 11:06 AM, Nicholas Ampazis wrote:
>>>>>>>>>>> Justin,
>>>>>>>>>>>
>>>>>>>>>>> I do have a "add_key.pl" in  the "/usr/share/eucalyptus" directory.
>>>>>>>>>>>
>>>>>>>>>>> However this might not be much relevant in the process of copying the
>>>>>>>>>>> ssh key in later versions of Eucalyptus (i.e. 1.6.x), since I've
>>>>>>>>>>> discovered that I could have achieved the same fix if I had
>>>>>>>>>>> substituted
>>>>>>>>>>>
>>>>>>>>>>> public_key_url=http://169.254.169.254/1.0/meta-data/public-keys/0/openssh-key
>>>>>>>>>>>
>>>>>>>>>>> by
>>>>>>>>>>>
>>>>>>>>>>> public_key_url=http://169.254.169.254/latest/meta-data/public-keys/0/openssh-key
>>>>>>>>>>>
>>>>>>>>>>> (instead of  public_key_url=http://169.254.169.254/2008-02-01/meta-data/public-keys/0/openssh-key)
>>>>>>>>>>>
>>>>>>>>>>> in /etc/init.d/ec2-get-credentials" of starcluster iso.
>>>>>>>>>>>
>>>>>>>>>>> Notice that in this case "latest" points to the same directory as
>>>>>>>>>>> "api_ver" which in your eucalyptus installation (1.6.2) just happens
>>>>>>>>>>> to be "1.0", so it works out of the box!
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Nicholas
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> P.S. Is there any progress in the starcluster git python code with
>>>>>>>>>>> regards to commands that did not work with eucalyptus credentials
>>>>>>>>>>> (e.g. starcluster start , etc)?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Mar 29, 2010 at 7:35 AM, Justin Riley <jtriley_at_mit.edu> wrote:
>>>>>>>>>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>>>>>>>>>> Hash: SHA1
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Nicholas,
>>>>>>>>>>>>
>>>>>>>>>>>> Awesome, glad to hear you've got the StarCluster ami working with
>>>>>>>>>>>> Eucalyptus. I'm still a little curious as to why I didn't need those
>>>>>>>>>>>> modifications to /etc/init.d/ec2-get-credentials and you did.
>>>>>>>>>>>>
>>>>>>>>>>>> My current theory on this:
>>>>>>>>>>>>
>>>>>>>>>>>> I believe that Eucalyptus is running the script
>>>>>>>>>>>> $EUCALYPTUS/usr/share/eucalyptus/add_key.pl somewhere in the process of
>>>>>>>>>>>> bringing the instance up.
>>>>>>>>>>>>
>>>>>>>>>>>> Looking at this script it appears that they manually pipe the pub key
>>>>>>>>>>>> into root's authorized_keys file (ie they're mounting the iso and
>>>>>>>>>>>> creating the authorized_keys outside of the instance).
>>>>>>>>>>>>
>>>>>>>>>>>> My only guess as to why my EMI worked out of the box with respect to ssh
>>>>>>>>>>>> is because of this script. Maybe it's not being executed for some reason?
>>>>>>>>>>>>
>>>>>>>>>>>> Can you check if that script exists for you in /usr/share/eucalyptus?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks and in any event, thanks for tracking this down :D
>>>>>>>>>>>>
>>>>>>>>>>>> ~Justin
>>>>>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>
>>>>>>
>>
>>>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.14 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAkuyUWcACgkQ4llAkMfDcrk8ygCeLTf+MMo8I2er7PAQ2RbIijvQ
> 64QAn0pojZ4Phyz2jqWvVzOOIJkQ1xFS
> =Nh2d
> -----END PGP SIGNATURE-----
>
Received on Tue Mar 30 2010 - 15:56:29 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject