Re: "connection closed" when running "starcluster ssmaster"
The SSH daemon is responding (and the EC2 security group is not
blocking traffic), which is good.
However, logging onto the master was working a few hours ago and not
anymore, then try to log onto the Grid Engine execution node by using,
for example, "starcluster sshnode rps_cluster node001". If SSHing into
the execution node works, then it is likely to be an issue with the
StarCluster master instance.
Rayson
==================================================
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/
http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html
On Thu, Jan 16, 2014 at 4:55 PM, Signell, Richard <rsignell_at_usgs.gov> wrote:
> I set up a machine this morning and
> starcluster sshmaster rps_cluster
> was working fine to ssh in.
>
> But now I'm getting "Connection closed by 54.204.55.67"
>
> It seem that the cluster is running:
>
> rsignell_at_gam:~$ starcluster listclusters
> StarCluster - (http://star.mit.edu/cluster) (v. 0.94.3)
> Software Tools for Academics and Researchers (STAR)
> Please submit bug reports to starcluster_at_mit.edu
>
> ---------------------------------------------
> rps_cluster (security group: _at_sc-rps_cluster)
> ---------------------------------------------
> Launch time: 2014-01-16 08:18:09
> Uptime: 0 days, 08:34:07
> Zone: us-east-1a
> Keypair: mykey2
> EBS volumes: N/A
> Cluster nodes:
> master running i-7950f657 ec2-54-204-55-67.compute-1.amazonaws.com
> node001 running i-7a50f654 ec2-54-196-2-68.compute-1.amazonaws.com
> Total nodes: 2
>
> And I don't see anything obvious in the verbose debug output:
>
> rsignell_at_gam:~$ starcluster -d sshmaster rps_cluster
> StarCluster - (http://star.mit.edu/cluster) (v. 0.94.3)
> Software Tools for Academics and Researchers (STAR)
> Please submit bug reports to starcluster_at_mit.edu
>
> 2014-01-16 16:53:13,515 config.py:567 - DEBUG - Loading config
> 2014-01-16 16:53:13,515 config.py:138 - DEBUG - Loading file:
> /home/rsignell/.starcluster/config
> 2014-01-16 16:53:13,517 config.py:322 - DEBUG - include setting not
> specified. Defaulting to []
> 2014-01-16 16:53:13,518 config.py:322 - DEBUG - web_browser setting
> not specified. Defaulting to None
> 2014-01-16 16:53:13,518 config.py:322 - DEBUG - refresh_interval
> setting not specified. Defaulting to 30
> 2014-01-16 16:53:13,518 config.py:322 - DEBUG - include setting not
> specified. Defaulting to []
> 2014-01-16 16:53:13,518 config.py:322 - DEBUG - web_browser setting
> not specified. Defaulting to None
> 2014-01-16 16:53:13,519 config.py:322 - DEBUG - refresh_interval
> setting not specified. Defaulting to 30
> 2014-01-16 16:53:13,519 config.py:322 - DEBUG - aws_proxy_pass setting
> not specified. Defaulting to None
> 2014-01-16 16:53:13,519 config.py:322 - DEBUG - aws_validate_certs
> setting not specified. Defaulting to True
> 2014-01-16 16:53:13,520 config.py:322 - DEBUG - aws_ec2_path setting
> not specified. Defaulting to /
> 2014-01-16 16:53:13,520 config.py:322 - DEBUG - aws_region_name
> setting not specified. Defaulting to None
> 2014-01-16 16:53:13,521 config.py:322 - DEBUG - aws_region_host
> setting not specified. Defaulting to None
> 2014-01-16 16:53:13,521 config.py:322 - DEBUG - aws_s3_path setting
> not specified. Defaulting to /
> 2014-01-16 16:53:13,521 config.py:322 - DEBUG - aws_proxy_user setting
> not specified. Defaulting to None
> 2014-01-16 16:53:13,521 config.py:322 - DEBUG - aws_is_secure setting
> not specified. Defaulting to True
> 2014-01-16 16:53:13,522 config.py:322 - DEBUG - aws_s3_host setting
> not specified. Defaulting to None
> 2014-01-16 16:53:13,522 config.py:322 - DEBUG - aws_port setting not
> specified. Defaulting to None
> 2014-01-16 16:53:13,522 config.py:322 - DEBUG - ec2_private_key
> setting not specified. Defaulting to None
> 2014-01-16 16:53:13,522 config.py:322 - DEBUG - ec2_cert setting not
> specified. Defaulting to None
> 2014-01-16 16:53:13,523 config.py:322 - DEBUG - aws_proxy setting not
> specified. Defaulting to None
> 2014-01-16 16:53:13,523 config.py:322 - DEBUG - aws_proxy_port setting
> not specified. Defaulting to None
> 2014-01-16 16:53:13,523 config.py:322 - DEBUG - device setting not
> specified. Defaulting to None
> 2014-01-16 16:53:13,523 config.py:322 - DEBUG - partition setting not
> specified. Defaulting to None
> 2014-01-16 16:53:13,524 config.py:322 - DEBUG - device setting not
> specified. Defaulting to None
> 2014-01-16 16:53:13,524 config.py:322 - DEBUG - partition setting not
> specified. Defaulting to None
> 2014-01-16 16:53:13,525 config.py:322 - DEBUG - disable_queue setting
> not specified. Defaulting to False
> 2014-01-16 16:53:13,525 config.py:322 - DEBUG - volumes setting not
> specified. Defaulting to []
> 2014-01-16 16:53:13,525 config.py:322 - DEBUG - availability_zone
> setting not specified. Defaulting to None
> 2014-01-16 16:53:13,526 config.py:322 - DEBUG - spot_bid setting not
> specified. Defaulting to None
> 2014-01-16 16:53:13,526 config.py:322 - DEBUG - master_instance_type
> setting not specified. Defaulting to None
> 2014-01-16 16:53:13,526 config.py:322 - DEBUG - disable_cloudinit
> setting not specified. Defaulting to False
> 2014-01-16 16:53:13,526 config.py:322 - DEBUG - force_spot_master
> setting not specified. Defaulting to False
> 2014-01-16 16:53:13,526 config.py:322 - DEBUG - extends setting not
> specified. Defaulting to None
> 2014-01-16 16:53:13,526 config.py:322 - DEBUG - master_image_id
> setting not specified. Defaulting to None
> 2014-01-16 16:53:13,527 config.py:322 - DEBUG - userdata_scripts
> setting not specified. Defaulting to []
> 2014-01-16 16:53:13,527 config.py:322 - DEBUG - permissions setting
> not specified. Defaulting to []
> 2014-01-16 16:53:13,529 awsutils.py:74 - DEBUG - creating self._conn
> w/ connection_authenticator kwargs = {'proxy_user': None,
> 'proxy_pass': None, 'proxy_port': None, 'proxy': None, 'is_secure':
> True, 'path': '/', 'region': None, 'validate_certs': True, 'port':
> None}
> 2014-01-16 16:53:13,872 cluster.py:711 - DEBUG - existing nodes: {}
> 2014-01-16 16:53:13,872 cluster.py:719 - DEBUG - adding node
> i-7a50f654 to self._nodes list
> 2014-01-16 16:53:13,873 cluster.py:719 - DEBUG - adding node
> i-7950f657 to self._nodes list
> 2014-01-16 16:53:13,873 cluster.py:727 - DEBUG - returning self._nodes
> = [<Node: master (i-7950f657)>, <Node: node001 (i-7a50f654)>]
> 2014-01-16 16:53:14,063 cluster.py:711 - DEBUG - existing nodes:
> {u'i-7a50f654': <Node: node001 (i-7a50f654)>, u'i-7950f657': <Node:
> master (i-7950f657)>}
> 2014-01-16 16:53:14,064 cluster.py:714 - DEBUG - updating existing
> node i-7a50f654 in self._nodes
> 2014-01-16 16:53:14,064 cluster.py:714 - DEBUG - updating existing
> node i-7950f657 in self._nodes
> 2014-01-16 16:53:14,064 cluster.py:727 - DEBUG - returning self._nodes
> = [<Node: master (i-7950f657)>, <Node: node001 (i-7a50f654)>]
> 2014-01-16 16:53:14,168 node.py:1039 - DEBUG - Using native OpenSSH client
> 2014-01-16 16:53:14,169 node.py:1050 - DEBUG - ssh_cmd: ssh -i
> /home/rsignell/.ssh/mykey2.rsa
> root_at_ec2-54-204-55-67.compute-1.amazonaws.com
> Connection closed by 54.204.55.67
>
>
> I didn't see any "common problems" or "troubleshooting" sections in
> the starcluster documentation, and I checked the FAQ and the mailing
> list archives, but I probably overlooked something, as this certainly
> seems like a newbie question (which I am).
>
> Thanks,
> Rich
> --
> Dr. Richard P. Signell (508) 457-2229
> USGS, 384 Woods Hole Rd.
> Woods Hole, MA 02543-1598
> _______________________________________________
> StarCluster mailing list
> StarCluster_at_mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster
Received on Thu Jan 16 2014 - 17:25:02 EST
This archive was generated by
hypermail 2.3.0.