Version History

0.94

News

Version 0.94 of StarCluster is a feature release that fixes many bugs. See the “Features” and “Bug Fixes” sections below for details. This release is not compatible with any active clusters that were created using any previous version of StarCluster. Please terminate all existing clusters before upgrading to 0.94.

Features

  • Support for hs1.8xlarge, hi1.4xlarge, and m3 instance types
  • Added support for userdata scripts (new cluster setting: userdata_scripts)
  • Add all ephemeral drives for any instance type when launching instances
  • Updated base AMIs to latest 12.04 versions
  • OGS (formerly SGE): set default parallel environment allocation_rule to $fill_up
  • Added authentication agent forwarding to sshmaster, sshnode, and sshinstance
  • Support for loading config based on STARCLUSTER_CONFIG environment variable
  • New plugin to install python packages with pip (pypkginstaller plugin)
  • New plugin that either copies or autogenerates a boto config for the cluster user
  • Added –force option to terminate/stop commands
  • Add tags and virtualization type to listinstance output
  • Add –zone (-z) option to spothistory command

Bug Fixes

  • sshutils: source /etc/profile by default
  • awsutils: validate AWS SSL certificates by default
  • awsutils: refetch group before calling authorize
  • awsutils: use 3 decimal places in spot history prices
  • cluster: fix thread leak when running plugins
  • cluster: store cluster metadata in userdata/tags
  • cluster: dont launch master as spot when cluster_size=1
  • cluster: reraise uncaught exceptions in run_plugins
  • cluster: fix instance filtering for default VPC
  • cluster: only clean-up placement groups in supported regions
  • clustersetup: ebs volume mounting improvements
  • clustersetup: raise exception if user’s $HOME is owned by root
  • improve ebs volume device and partition detection
  • config: allow inline comments in the config file
  • config: support [vol] shortcut in config
  • config: document/warn where/when env overrides cfg
  • threadpool: clear the exceptions queue in wait()
  • node: fix Node.root_device_name for misconfigured AMIs
  • node: fix ~/.ssh ownership in generate_key_for_user
  • sge: Export DRMAA_LIBRARY_PATH for Python DRMAA
  • cli: use logging module to log/show exceptions
  • cli: write header to sys.stderr
  • logger: move console logs from stderr to stdout
  • logger: output error messages to stderr
  • logger: remove in-memory log handler (memory-leak fix)
  • fix for DependencyViolation when deleting security group
  • replace group-name filter with instance.group-name
  • spothistory: show current spot price instead of oldest

resizevol/createvol:

  • call e2fsck before resize2fs if needed
  • cleanup vol on failure in resize()
  • fix bug in _warn_about_volume_hosts
  • fix bug in sfdisk repartitioning
  • set default instance type to t1.micro for volume hosts
  • fix bug when cleaning up failed volumes
  • use force_spot_master when launching cluster

ebsimage/s3image:

  • fetch root device from instance metadata
  • only wait for snapshot if root device is in bmap
  • show progress bar for CreateImage snapshot
  • remove ssh host keys in clean_private_data
  • use ec2.get_snapshot to fetch snapshots

loadbalance:

  • collect all queue info from qstat
  • fix logic in SGEStats._count_tasks
  • fix node count in _eval_remove_node
  • get slot calculations from all.q
  • handle qacct error on fresh SGE install
  • improve qstat parsing
  • maintain a single instance of SGEStats
  • move static settings to __init__
  • remove bounds on interval and wait_time
  • remove estimated time to completion calcs
  • replace ‘qlen > ts’ condition in _eval_add_node
  • require positive ints for numeric options
  • take slots into account in _eval_add_node
  • update sge tests for latest parser changes
  • use # of nodes for max_nodes check (not exec hosts)
  • don’t wait for cluster to stabilize to add a node when there are no slots
  • immediately add a node when jobs are in the queue and no slots exists

IPython plugin:

  • Add section on engine restart + note on growing new engines
  • ipcluster: raise exception if IPython isnt installed
  • Better docstrings for the stop and restart plugins
  • Better packer configuration support + updated doc
  • Make it possible to configure a custom notebook folder
  • Make it possible to restart engine, better logs, make msgpack setup explicit
  • Open the controller ports for IPython 0.13.1
  • Use the cluster thread pool to start the engines on the nodes
  • Update the documentation according to tests done on 0.13.1

0.93.3

News

Version 0.93.3 of StarCluster is a patch release that fixes several bugs. See the “Bug Fixes” section below for details.

  • Bumped boto dep to 2.3.0
  • Updated and improved default config template

Bug Fixes

  • sge: fix bug where master could never be an exec host even when toggling the master_is_exec_host plugin setting (issue 89)
  • fix bug when setting custom SSH permission without cidr_ip (issue 91)
  • start-cmd: fix bug where refresh_interval is always None if [global] is not defined in the config (issue 90)
  • dont allow cluster_user=root when validating cluster settings (issue 92)
  • fix another textwrap error with python 2.5

0.93.2

News

Version 0.93.2 of StarCluster fixes several bugs and adds a couple new features. See the “Bug Fixes” section below for details.

  • Amazon recently announced a new instance type m1.medium which is now supported. Amazon also enabled the m1.small and c1.medium instance types to be compatible with both 32bit and 64bit AMIs now. StarCluster now supports the new m1.medium instance type as well as the new changes to m1.small and c1.medium.
  • Switched from paramiko to actively maintained fork called ssh (1.7.13)
  • loadbalancer: --kill-master option renamed to --kill-cluster
  • Bumped boto dep to 2.2.2
  • Bumped jinja dep to 2.6

Features

  • SGE setup code has been moved to a plugin that is executed by default (no configuration changes required). This plugin supports an optional setting master_is_exec_host which controls whether or not the master node executes jobs. This is set to True by default. To change this you will need to set disable_queue=True in your cluster config and then manually define and add the SGE plugin in order to configure the new plugin’s settings. See Sun Grid Engine Plugin for more details.
  • Added the following new options to addnode command: --image-id, --instance-type, --availability-zone, --bid. Thanks to Dan Yamins for his contributions. (issue 76)
  • New CreateUsers plugin that creates multiple cluster users and configures passwordless SSH. Either number of users or a list of usernames can be specified. Also has options to tar and download all SSH keys for generated users. See Create Users Plugin for more details.
  • Added -X option to sshmaster and sshnode commands for forwarding X11. Can be used both when logging in interactively and when executing remote commands.

Bug Fixes

  • catch Garbage packet received exception and warn users about incompatible AMI
  • createvolume: support new /dev/xvd* device names
  • fix textwrap error with python 2.5 (issue 85)
  • sshmaster, sshnode: load /etc/profile by default when executing remote cmds
  • terminate: handle Ctrl-D gracefully when prompting for input (issue 80)
  • terminate: ignore keys when terminating manually
  • validate key_location is readable (issue 77)
  • relax validation constraints for running instances when using starcluster start -x (issue 78)
  • fix cluster validation for t1.micro AMIs/instance types
  • loadbalance: fix _get_stats when no exec hosts
  • loadbalance: respect min_nodes when removing nodes (issue 75)
  • loadbalance: fix -K option to kill cluster when queue is empty (issue 86)
  • loadbalance: retry fetching stats 5x before failing (issue 65)
  • skip NFS exports setup when there are no worker nodes
  • sge: configure SGE profile on all nodes concurrently
  • sge: use qconf’s -*attr commands to update cluster config (replaced a bunch of ugly code that parsed SGE’s console output)
  • add Changelog to MANIFEST.in in order to include Changelog in future release dists

0.93.1

News

Warning

It is recommended that you upgrade as soon as possible.

Version 0.93.1 of StarCluster is largely a patch release that fixes several bugs. See the “Bug Fixes” section below for details.

Also, the put and get commands have been marked stable which means they no longer require ENABLE_EXPERIMENTAL=True in the config in order to use them.

Features

  • Added sys.argv to crash reports for better details when things go wrong
  • Implemented on_add_node and on_remove_node for TMUX plugin to add/remove nodes from the TMUX control-center sessions for both root and CLUSTER_USER
  • Implemented on_add_node for the Ubuntu package installer plugin (pkginstaller) to ensure new nodes also install the required packages
  • Implemented on_add_node for the MPICH2 plugin to ensure new nodes are configured for MPICH2 and also added to the hosts file. Moved the MPICH2 hosts file to an NFS-shared location (/home/mpich2.hosts) for faster configuration
  • Implemented on_add_node for the Xvfb plugin that installs and runs Xvfb on newly added nodes

Bug Fixes

  • Fixed major NFS bug in addnode command where /etc/exports was being wiped each time a new node was added. This caused the last-to-be-added node to be the only node correctly configured for NFS in the entire cluster.
  • Updated all plugins which subclass DefaultClusterSetup to implement on_add_node and on_remove_node regardless of whether or not they actually use them to avoid the ‘default’ plugin’s implementation being incorrectly called multiple times by plugins when adding/removing nodes using the addnode and removenode commands
  • Fixed string formatting bug in Condor plugin

0.93

News

New StarCluster Ubuntu 11.10 EBS AMIs available including a new HVM AMI for use with CC/CC2/CG instance types! Includes OpenGridScheduler (to replace SGE), Condor, Hadoop/Dumbo, MPICH2, OpenMPI, IPython 0.12 (parallel+notebook), and more. Run dpkg -l on the AMIs for the full list of packages.

The new StarCluster Ubuntu 11.10 EBS HVM AMI additionally comes with NVIDIA driver, CUDA Toolkit/SDK, PyCuda, PyOpenCL, and Magma for GPU computing.

These new AMIs are available in all AWS regions. The new us-east-1 11.10 EBS AMIs are now the defaults in the StarCluster config template. Use the listpublic command to obtain a list of available StarCluster AMIs in other regions:

$ starcluster -r sa-east-1 listpublic

Note

The HVM AMI is currently only available in the us-east-1 region until other regions support HVM.

Features

  • Added support for new CC2 instance types

  • Added support for specifying a proxy host to use when connecting to AWS:

    [aws info]
    aws_proxy = your.proxy.com
    aws_proxy_port = 8080
    aws_proxy_user = yourproxyuser
    aws_proxy_pass = yourproxypass

    See Using a Proxy Host for more details

  • Updated IPCluster plugin to support IPython 0.12 with parallel and notebook support (SSL+password auth). See IPython Cluster Plugin for more details

  • Keypairs are now validated against EC2 fingerprint which notifies users ahead of time if the file they specified in KEY_LOCATION doesn’t match the EC2 keypair specified

  • Support for Windows (tested on 32bit Windows 7). See Installing on Windows for more details

New Plugins

  • Condor - experimental support for creating a Condor pool on StarCluster. See Condor Plugin for more details
  • Hadoop - support for creating a Hadoop cluster with StarCluster See Hadoop Plugin for more details
  • MPICH2 - configure cluster to use MPICH2 instead of OpenMPI See MPICH2 Plugin for more details
  • TMUX - configures cluster SSH “dashboard” in tmux See TMUX Plugin for more details
  • Package installer - install Ubuntu packages on all nodes concurrently See Package Installer Plugin for more details
  • Xvfb - install and configure an Xvfb session on all nodes (sets DISPLAY=:1) See Xvfb Plugin for more details
  • MySQL - install and configure a MySQL cluster on StarCluster. See MySQL Cluster Plugin for more details

Bug Fixes

  • Fix bug in put command where in random cases the last few bytes of the file weren’t being transferred
  • Raise an error if placement group creation fails
  • Fix bug in terminate command in the case that nodes are already shutting-down when the terminate command is executed
  • Fix bug in s3image when destination S3 bucket already exists
  • Fix known_hosts warnings when ssh’ing from node to node internally
  • Fix trivial logging bug in createvolume command

0.92.1

release-date:11-05-2011

Features

  • Support for splitting the config into an arbitrary set of files:

    [global]
    include=~/.starcluster/awscreds, ~/.starcluster/myconf

    See Splitting the Config into Multiple Files for more details

  • createvolume: support naming/tagging newly created volumes:

    $ starcluster createvolume --name mynewvol 30 us-east-1d

    See Create and Format a new EBS Volume for more details

  • listvolumes: add support for filtering by tags:

    $ starcluster listvolumes --name mynewvol
    $ starcluster listvolumes --tag mykey=myvalue

    See Managing EBS Volumes for more details

  • sshmaster, sshnode, sshinstance: support for running remote commands from command line:

    $ starcluster sshmaster mycluster 'cat /etc/fstab'
    $ starcluster sshnode mycluster node001 'cat /etc/fstab'
    $ starcluster sshinstance i-99999999 'cat /etc/hosts'

    See Running Remote Commands from Command Line for more details

Bug Fixes

The following bugs were fixed in this release:

spothistory command

  • add package_data to sdist in order to include the necessary web media and templates needed for the --plot feature. The previous 0.92 version left these out and thus the --plot feature was broken. This should be fixed.
  • fix bug when launching default browser on mac

start command

  • fix bug in option completion when using the start command’s --cluster-template option

terminate command

  • fix bug in terminate cmd when region != us-east-1

listkeypairs command

  • fix bug in list_keypairs when no keys exist

Improvements

  • listinstances: add ‘state_reason’ msg to output if available
  • add system info, Python info, and package versions to crash-report
  • listregions: sort regions by name
  • improved bash/zsh completion support. completion will read from the correct config file, if possible, in the case that the global -c option is specified while completing.
  • separate the timing of cluster setup into time spent on waiting for EC2 instances to come up and time spent configuring the cluster after all instances are up and running. this is useful when profiling StarCluster’s performance on large (100+ node) clusters.

0.92

release-date:10-17-2011

News

Note

Before upgrading please be aware that rc1 does not support clusters created with previous versions and will print a warning along with instructions on how to proceed if it finds old clusters. With that said, it’s best not to upgrade until you’ve terminated any currently running clusters created with an old version. Also, the @sc-masters group is no longer needed and can be removed after upgrading by running “starcluster terminate masters”

You can find the new docs for 0.92rc1 here:

http://star.mit.edu/cluster/docs/0.92rc1/

NOTE: these docs are still very much a work in progress. If anyone is interested in helping to improve the docs, please fork the project on github and submit pull requests. Here’s a guide to get you started:

http://star.mit.edu/cluster/docs/0.92rc1/contribute.html

Please upgrade at your leisure, test the new release candidate, and submit any bug reports preferably on StarCluster’s github issue tracker or this mailing list.

Features

  • Cluster Compute/GPU Instances- Added support for the new Cluster Compute/GPU instance types. Thanks to Fred Rotbart for his contributions

  • EBS-Backed Clusters- Added support for starting/stopping EBS-backed clusters on EC2. The stop command now stops (instead of terminate) an EBS-backed cluster and terminates an S3-backed cluster. Added a terminate command that terminates any cluster

  • Improved performance - using workerpool library (http://pypi.python.org/pypi/StarCluster) to configure nodes concurrently

  • New ebsimage and s3image commands for easily creating new EBS-backed AMIs. Each command supports creating a new AMI from S3 and EBS-backed image hosts respectively. (NOTE: createimage has been renamed to s3image. you can still call createimage but this will go away in future releases)

  • Add/Remove Nodes - added new addnode and removenode commands for adding/removing nodes to a cluster and removing existing nodes from a cluster

  • Restart command - Added new restart command that reboots the cluster and reconfigures the cluster

  • Create Keypairs - Added ability to add/list/remove keypairs

  • Elastic Load Balancing - Support for shrinking/expanding clusters based on Sun Grid Engine queue statistics. This allow the user to start a single-node cluster (or larger) and scale the number of instances needed to meet the current SGE queue load. For example, a single-node cluster can be launched and as the queue load increases new EC2 instances are launched, added to the cluster, used for computation, and then removed when they’re idle. This minimizes the cost of using EC2 for an unknown and on-demand workload. Thanks to Rajat Banerjee (rqbanerjee)

  • Security Group Permissions - Added ability to specify permission settings to be applied automatically to a cluster’s security group after it’s been started

  • Multiple Instance Types - Added support for specifying instance types on a per-node basis. Thanks to Dan Yamins for his contributions

  • Unpartitioned Volumes- StarCluster now supports both partitioned and unpartitioned EBS volumes

  • New Plugin Hooks - Plugins can now play a part when adding or removing a node as well as when restarting or shutting down the entire cluster by implementing the on_remove_node, on_add_node, on_shutdown, and on_reboot methods respectively

  • Added a runplugin command and fixed the run_plugin() method in the development shell. You can now run a plugin in the dev shell like so:

    ~[1]> cm.run_plugin('myplugin', 'mycluster')
    
  • Added support for easily switching regions (without config changes) using a global -z option. For example:

    $ starcluster -r eu-west-1 listclusters
    StarCluster - (http://star.mit.edu/cluster) (v. 0.92rc1)
    Software Tools for Academics and Researchers (STAR)
    Please submit bug reports to starcluster@mit.edu
    
    >>> No clusters found...
  • Added new resizevolume command for easily resizing existing EBS volumes

  • Added listregions/listzones commands