StarCluster - Mailing List Archive

Re: post wedge cleanup

From: Alex Gaudio <no email>
Date: Wed, 2 Oct 2013 12:28:27 -0400

Hi Steve,

Sorry for not responding sooner - I didn't see this in my email until now.
 I have never encountered those errors, and not knowing what you did to
produce them, I'm not sure how helpful I can be. Here's a stab at it:

Based on the error, it seems that when you deleted the nodes, but you
didn't delete the security group. You should delete the security group in
the aws console in addition to terminating the nodes.

To the larger problem of why you got that error in the first place, here
are some questions:
 - The VPC code you need to use is here:
https://github.com/jtriley/StarCluster/pull/236
    - did you install in correctly?
 - Did you modify the code base to get this to work? (You don't need to).
 - Did you attempt to start the cluster from inside the vpc? If you don't,
the initialization should hang (but should be recoverable)
 - If you still have problems, have you tried spinning up non-spot (flat
rate) instances in vpc? I know for sure that those work. There are more
things that can go wrong with spot instances, so I'd try a non-spot cluster
first.

I just spun up a spot instance cluster with no problems, so it should work
for you. If you get things working, I'd love to know what you did to fix
it.

Alex


On Mon, Sep 16, 2013 at 3:22 PM, Steve Heistand <steve.heistand_at_nasa.gov>wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> hi folks,
>
> so Im playing around with the VPC version of starcluster and in the
> process have
> gotten things a little wedged. Errors pop up in the start process that I
> fix
> but at some point a cluster got up enough to be "alive" but not really
> working.
> I had to clean things up via the AWS console and now I cant clean up what
> starcluster thinks is going on.:
>
> # starcluster listclusters
> StarCluster - (http://star.mit.edu/cluster) (v. 0.94)
> Software Tools for Academics and Researchers (STAR)
> Please submit bug reports to starcluster_at_mit.edu
>
> *** WARNING - Setting 'EC2_PRIVATE_KEY' from environment...
> *** WARNING - Setting 'EC2_CERT' from environment...
> - ----------------------------
> hos (security group: sc-hos)
> - ----------------------------
> Launch time: N/A
> Uptime: N/A
> Zone: N/A
> Keypair: N/A
> EBS volumes: N/A
> !!! ERROR - InvalidPermission.NotFound: The specified rule does not exist
> in this
> security group.
> Traceback (most recent call last):
> File
>
> "/usr/local/lib/python2.7/dist-packages/StarCluster-0.94-py2.7.egg/starcluster/cli.py",
> line 274, in main
> sc.execute(args)
> File
>
> "/usr/local/lib/python2.7/dist-packages/StarCluster-0.94-py2.7.egg/starcluster/commands/listclusters.py",
> line 36, in execute
> show_ssh_status=self.opts.show_ssh_status)
> File
>
> "/usr/local/lib/python2.7/dist-packages/StarCluster-0.94-py2.7.egg/starcluster/cluster.py",
> line 331, in list_clusters
> spot_reqs = cl.spot_requests
> File
>
> "/usr/local/lib/python2.7/dist-packages/StarCluster-0.94-py2.7.egg/starcluster/cluster.py",
> line 777, in spot_requests
> filters = {'launch.group-id': self.cluster_group.id,
> File
>
> "/usr/local/lib/python2.7/dist-packages/StarCluster-0.94-py2.7.egg/starcluster/cluster.py",
> line 684, in cluster_group
> static.WORLD_CIDRIP)
> File
>
> "/usr/local/lib/python2.7/dist-packages/boto-2.9.9-py2.7.egg/boto/ec2/securitygroup.py",
> line 222, in revoke
> src_group_group_id)
> File
>
> "/usr/local/lib/python2.7/dist-packages/boto-2.9.9-py2.7.egg/boto/ec2/connection.py",
> line 2634, in revoke_security_group
> params, verb='POST')
> File
> "/usr/local/lib/python2.7/dist-packages/boto-2.9.9-py2.7.egg/boto/connection.py",
> line 1115, in get_status
> raise self.ResponseError(response.status, response.reason, body)
> EC2ResponseError: EC2ResponseError: 400 Bad Request
> <?xml version="1.0" encoding="UTF-8"?>
> <Response><Errors><Error><Code>InvalidPermission.NotFound</Code><Message>The
> specified
> rule does not exist in this security
>
> group.</Message></Error></Errors><RequestID>d4d51a6a-cb7d-471f-b1aa-54a4ceb35ede</RequestID></Response>
>
>
> is there a database that can be modified to clean up the exists of this
> bad cluster?
> or some such method of cleaning things up?
>
> the startcluster terminate -f cluster_name also fails in various and
> exciting ways.
>
> thanks
>
> s
>
> - --
> ************************************************************************
> Steve Heistand NASA Ames Research Center
> email: steve.heistand_at_nasa.gov Steve Heistand/Mail Stop 258-6
> ph: (650) 604-4369 Bldg. 258, Rm. 232-5
> Scientific & HPC Application P.O. Box 1
> Development/Optimization Moffett Field, CA 94035-0001
> ************************************************************************
> "Any opinions expressed are those of our alien overlords, not my own."
>
> # For Remedy #
> #Action: Resolve #
> #Resolution: Resolved #
> #Reason: No Further Action Required #
> #Tier1: User Code #
> #Tier2: Other #
> #Tier3: Assistance #
> #Notification: None #
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.14 (GNU/Linux)
>
> iEYEARECAAYFAlI3WoEACgkQoBCTJSAkVrF0gACg4uiExQ5N4hJfFP+RfyvVRbRB
> BxsAoIAHX33knuzEkEeat7JeL7pTNPYC
> =yNt6
> -----END PGP SIGNATURE-----
> _______________________________________________
> StarCluster mailing list
> StarCluster_at_mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster
>
Received on Wed Oct 02 2013 - 12:31:06 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject