StarCluster - Mailing List Archive

Re: post wedge cleanup

From: Steve Heistand <no email>
Date: Thu, 17 Oct 2013 07:24:06 -0700

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

(govt shutdown delay in answering... )

I dont remember which version I had installed of the VPC code, cant quite access
my cloud machines yet this morning.
But for the other questions, I did modify the code. Im running centos as the
compute instances and all of starcluster is assuming ubuntu. So some OS related
tweaks are needed. But nothing on the VPC side of things that I recall.

The main problem (on my part) was that I didnt notice that starcluster sets some
security groups that didnt get cleaned up when I shut things down.
going into the ec2 console web page and removing them manually allowed
starcluster to work just fine.

thanks

s


On 10/02/2013 09:28 AM, Alex Gaudio wrote:
> Hi Steve,
>
> Sorry for not responding sooner - I didn't see this in my email until now. I have
> never encountered those errors, and not knowing what you did to produce them, I'm not
> sure how helpful I can be. Here's a stab at it:
>
> Based on the error, it seems that when you deleted the nodes, but you didn't delete
> the security group. You should delete the security group in the aws console in
> addition to terminating the nodes.
>
> To the larger problem of why you got that error in the first place, here are some
> questions: - The VPC code you need to use is here:
> https://github.com/jtriley/StarCluster/pull/236 - did you install in correctly? - Did
> you modify the code base to get this to work? (You don't need to). - Did you attempt
> to start the cluster from inside the vpc? If you don't, the initialization should
> hang (but should be recoverable) - If you still have problems, have you tried
> spinning up non-spot (flat rate) instances in vpc? I know for sure that those work.
> There are more things that can go wrong with spot instances, so I'd try a non-spot
> cluster first.
>
> I just spun up a spot instance cluster with no problems, so it should work for you.
> If you get things working, I'd love to know what you did to fix it.
>
> Alex
>
>
> On Mon, Sep 16, 2013 at 3:22 PM, Steve Heistand <steve.heistand_at_nasa.gov>wrote:
>
> hi folks,
>
> so Im playing around with the VPC version of starcluster and in the process have
> gotten things a little wedged. Errors pop up in the start process that I fix but at
> some point a cluster got up enough to be "alive" but not really working. I had to
> clean things up via the AWS console and now I cant clean up what starcluster thinks
> is going on.:
>
> # starcluster listclusters StarCluster - (http://star.mit.edu/cluster) (v. 0.94)
> Software Tools for Academics and Researchers (STAR) Please submit bug reports to
> starcluster_at_mit.edu
>
> *** WARNING - Setting 'EC2_PRIVATE_KEY' from environment... *** WARNING - Setting
> 'EC2_CERT' from environment... ---------------------------- hos (security group:
> sc-hos) ---------------------------- Launch time: N/A Uptime: N/A Zone: N/A Keypair:
> N/A EBS volumes: N/A !!! ERROR - InvalidPermission.NotFound: The specified rule does
> not exist in this security group. Traceback (most recent call last): File
>
> "/usr/local/lib/python2.7/dist-packages/StarCluster-0.94-py2.7.egg/starcluster/cli.py",
>
>
line 274, in main
> sc.execute(args) File
>
> "/usr/local/lib/python2.7/dist-packages/StarCluster-0.94-py2.7.egg/starcluster/commands/listclusters.py",
>
>
line 36, in execute
> show_ssh_status=self.opts.show_ssh_status) File
>
> "/usr/local/lib/python2.7/dist-packages/StarCluster-0.94-py2.7.egg/starcluster/cluster.py",
>
>
line 331, in list_clusters
> spot_reqs = cl.spot_requests File
>
> "/usr/local/lib/python2.7/dist-packages/StarCluster-0.94-py2.7.egg/starcluster/cluster.py",
>
>
line 777, in spot_requests
> filters = {'launch.group-id': self.cluster_group.id, File
>
> "/usr/local/lib/python2.7/dist-packages/StarCluster-0.94-py2.7.egg/starcluster/cluster.py",
>
>
line 684, in cluster_group
> static.WORLD_CIDRIP) File
>
> "/usr/local/lib/python2.7/dist-packages/boto-2.9.9-py2.7.egg/boto/ec2/securitygroup.py",
>
>
line 222, in revoke
> src_group_group_id) File
>
> "/usr/local/lib/python2.7/dist-packages/boto-2.9.9-py2.7.egg/boto/ec2/connection.py",
>
>
line 2634, in revoke_security_group
> params, verb='POST') File
> "/usr/local/lib/python2.7/dist-packages/boto-2.9.9-py2.7.egg/boto/connection.py",
> line 1115, in get_status raise self.ResponseError(response.status, response.reason,
> body) EC2ResponseError: EC2ResponseError: 400 Bad Request <?xml version="1.0"
> encoding="UTF-8"?>
> <Response><Errors><Error><Code>InvalidPermission.NotFound</Code><Message>The
> specified rule does not exist in this security
>
> group.</Message></Error></Errors><RequestID>d4d51a6a-cb7d-471f-b1aa-54a4ceb35ede</RequestID></Response>
>
>
>
> is there a database that can be modified to clean up the exists of this bad cluster?
> or some such method of cleaning things up?
>
> the startcluster terminate -f cluster_name also fails in various and exciting ways.
>
> thanks
>
> s
>
>> _______________________________________________ StarCluster mailing list
>> StarCluster_at_mit.edu http://mailman.mit.edu/mailman/listinfo/starcluster
>>
>

- --
************************************************************************
 Steve Heistand NASA Ames Research Center
 email: steve.heistand_at_nasa.gov Steve Heistand/Mail Stop 258-6
 ph: (650) 604-4369 Bldg. 258, Rm. 232-5
 Scientific & HPC Application P.O. Box 1
 Development/Optimization Moffett Field, CA 94035-0001
************************************************************************
 "Any opinions expressed are those of our alien overlords, not my own."

# For Remedy #
#Action: Resolve #
#Resolution: Resolved #
#Reason: No Further Action Required #
#Tier1: User Code #
#Tier2: Other #
#Tier3: Assistance #
#Notification: None #
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)

iEYEARECAAYFAlJf8wYACgkQoBCTJSAkVrE7lQCeOgg75KxJp3bmCApCr9xnGSfe
E1YAmgKv4e12WGNnqOkPAdgueHyh/G/R
=QVz5
-----END PGP SIGNATURE-----
Received on Thu Oct 17 2013 - 10:24:11 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject