Version History

0.95.2

News

Version 0.95.2 of StarCluster is a bug fix release. Please note that this version is not compatible with clusters started with older versions of StarCluster. Please terminate all clusters before upgrading to 0.95.2.

Bug Fixes

  • cluster: sort chunked cluster tags before loading (gh-371)
  • config: set public_ip setting type to boolean (gh-366)

0.95.1

News

Version 0.95.1 of StarCluster is a bug fix release. Please note that this version is not compatible with clusters started with older versions of StarCluster. Please terminate all clusters before upgrading to 0.95.1.

Bug Fixes

  • cluster: fix max length error when applying cluster tags (gh-348)
  • start: fix default value for --no-public-ips option (gh-366)
  • cluster: fix for custom security group permissions (gh-356)

0.95

News

Version 0.95 of StarCluster is a feature release with many enhancements and bug fixes. This release has support for Amazon’s Virtual Private Cloud (VPC). See Using the Virtual Private Cloud (VPC) for details on using StarCluster with VPC. This release comes with new Ubuntu 13.04 AMIs for 32bit and 64bit platforms as well as an HVM AMI that comes with drivers for GPU and enhanced networking. The HVM AMI also comes with the CUDA toolkit, PyCuda, MAGMA, and OGS GPU sensor. These new AMIs are available in all AWS regions. Please use the listpublic command to find them:

$ starcluster listpublic

The sections below list the full set of features, enhancements, and bug fixes for this release. Please note that this version is not compatible with clusters started with older versions of StarCluster. Please terminate all clusters before upgrading to 0.95.

Features

  • Add Virtual Private Cloud (VPC) support
  • Update base StarCluster AMI ids to 13.04
  • Add support for new C3 instance types
  • Add support for new GRID GPU g2.2xlarge (non-GPGPU) instance type
  • Add support for m3.{medium,large} instance types
  • Add support for new I2 instance family
  • start: add --subnet (-N) and --no-public-ips option (VPC-only)
  • start: add --dns-prefix (-P) option to prefix hostnames with the cluster tag
  • removenode: added --force (-f) option to removenode
  • removenode: prompt for confirmation and add --confirm (-c) option
  • sshmaster, sshnode: add --pseudo-tty (-t) option
  • createkey: add --import-key (-i) option
  • cluster: validate imported keypair fingerprints
  • spothistory: add --vpc and --classic options
  • spothistory: use classic or vpc prices based on default vpc
  • hadoop: tune number of map/reduce tasks to number of CPUs
  • hadoop: document all advanced settings
  • listclusters, listinstances: show vpc and subnet ids

Enhancements

  • removenode: update usage to mimic addnode (ie added -n and -a options similar to addnode) - old usage works but is deprecated
  • Removed internal scp module in favor of @jbardin’s scp module
  • awsutils: wait for placement and security group deletion propagation in ec2.delete_group
  • node: don’t force NFSv3 - use OS default (NFSv4 on recent AMIs)
  • ebsimage: reduce rsync output by switching -v to -q (s3->ebs)
  • ebsimage: log the ebs volume formatting step
  • volume: log the exception when volume creation fails
  • loadbalance: set UTC tzinfo in all datetime objects
  • utils: use UTC timezone for all datetime objects
  • cluster: deprecated all get_node_by* calls in favor of cluster.get_node
  • awsutils: raise exc in EasyEC2.wait_for_propagation after max_retries
  • Use pytest to run tests

Bug Fixes

  • node: fix stale NFS entries from force-terminated spot nodes
  • node: remove ‘user’ setting from NFS mount options
  • node: add work-around for bug in debian nfs-kernel-server init script in Ubuntu 13.04+
  • node: use apt-get instead of apt in package_provider property
  • node: run apt-get update before install in apt_install
  • cluster: always use master’s zone if not specified
  • cluster: store availability_zone in core cluster settings tag
  • cluster: raise error if MOUNT_PATH=/
  • cluster: retry deleting placement group until successful
  • cluster: check for master in remove_nodes prior to removing any nodes
  • loadbalance: add nodes when below minimum
  • loadbalance: raise an exception if min_nodes > max_nodes in loadbalancer
  • loadbalance: fix crash in get_all_stats() when arr is empty
  • ebsimage: set fs label = / on root vol when converting s3->ebs
  • ebsimage: fix for bmap when converting s3->ebs
  • ebsimage: only set 1 ephemeral drive when converting s3->ebs
  • volume: only detach if vol != available in _delete_new_volume
  • volume: remove duplicates from volume hosts list
  • spothistory: improve accuracy by not using a possibly out-of-sync local time stamp for the end date as long as -e is not specified
  • spothistory: set UTC tzinfo on all datetime objects
  • awsutils: raise exception if keypair exists in create_keypair
  • awsutils: use instance store bmap for 32bit m1.small
  • awsutils: dont specify ephemeral drives w/ t1.micro
  • cli: raise parser error if tag is incorrectly specified
  • condor: fetch & create condor dirs from config
  • condor: set SCHEDD_NAME on each host
  • condor: improve dedicated scheduler config

0.94.3

News

Version 0.94.3 of StarCluster is a bug fix release. See the “Bug Fixes” sections below for details. In general it’s recommended to terminate all clusters before upgrading to new versions of StarCluster although this is not always strictly required.

Bug Fixes

  • Fixed ‘InvalidSnapshot.NotFound’ errors on cluster launch
  • Fixed issue with Server.InternalError: Internal error on launch when using m1.small instance type with a 32bit AMI.
  • Updated default AMIs in config template to 64bit AMIs

0.94.2

News

Version 0.94.2 of StarCluster is a bug fix release. See the “Bug Fixes” sections below for details. In general it’s recommended to terminate all clusters before upgrading to new versions of StarCluster although this is not always strictly required.

Bug Fixes

  • node: fixed regression where the instance’s connection object was not being reused which was causing issues for folks using regions other than the default us-east-1.

0.94.1

News

Version 0.94.1 of StarCluster is a bug fix release. See the “Features” and “Bug Fixes” sections below for details. In general it’s recommended to terminate all clusters before upgrading to new versions of StarCluster although this is not always strictly required.

Features

  • restart: add --reboot-only option
  • awsutils: force delete_on_termation on root volume when launching instances
  • delay using optional imports in starcluster.utils until needed (reduces the time it takes to load StarCluster)

Bug Fixes

  • cluster: wait for all instance and spot requests to propagate (fixes ‘instance not found’ errors and ‘missing’ spot requests when waiting for cluster to come up)
  • cluster: terminate any instance that’s been pending for more than 15 minutes (fixes infinite wait due to a forever pending node)
  • s3image: fix block device map issue when host instance is EBS-backed
  • awsutils: fix custom block device map for S3 instances/AMIs
  • sshnode: fix keypair validation bug
  • node: fix for duplicate entries in known_hosts file
  • node: fix bug where credentials attributes were missing from node.ec2 object
  • loadbalancer: always pass --utc when calling date command
  • loadbalancer: set matplotlib backend to Agg - allows --plot option to work without a display
  • config: convert EC2_CERT and EC2_PRIVATE_KEY in [aws] section to absolute path
  • config: fix typo in config template (PROTOCOL -> IP_PROTOCOL)
  • tmux plugin: fix race condition where new-window -t is called before new-session has finished
  • spothistory: update help docs for --plot option

0.94

News

Version 0.94 of StarCluster is a feature release that fixes many bugs. See the “Features” and “Bug Fixes” sections below for details. This release is not compatible with any active clusters that were created using any previous version of StarCluster. Please terminate all existing clusters before upgrading to 0.94.

Features

  • Support for hs1.8xlarge, hi1.4xlarge, and m3 instance types
  • Added support for userdata scripts (new cluster setting: userdata_scripts)
  • Add all ephemeral drives for any instance type when launching instances
  • Updated base AMIs to latest 12.04 versions
  • OGS (formerly SGE): set default parallel environment allocation_rule to $fill_up
  • Added authentication agent forwarding to sshmaster, sshnode, and sshinstance
  • Support for loading config based on $STARCLUSTER_CONFIG environment variable
  • New plugin to install python packages with pip (pypkginstaller plugin)
  • New plugin that either copies or autogenerates a boto config for the cluster user
  • Added --force option to terminate/stop commands
  • Add tags and virtualization type to listinstance output
  • Add --zone (-z) option to spothistory command

Bug Fixes

  • sshutils: source /etc/profile by default
  • awsutils: validate AWS SSL certificates by default
  • awsutils: refetch group before calling authorize
  • awsutils: use 3 decimal places in spot history prices
  • cluster: fix thread leak when running plugins
  • cluster: store cluster metadata in userdata/tags
  • cluster: dont launch master as spot when CLUSTER_SIZE=1
  • cluster: reraise uncaught exceptions in run_plugins
  • cluster: fix instance filtering for default VPC
  • cluster: only clean-up placement groups in supported regions
  • clustersetup: ebs volume mounting improvements
  • clustersetup: raise exception if user’s $HOME is owned by root
  • improve ebs volume device and partition detection
  • config: allow inline comments in the config file
  • config: support [vol] shortcut in config
  • config: document/warn where/when env overrides cfg
  • threadpool: clear the exceptions queue in wait()
  • node: fix Node.root_device_name for misconfigured AMIs
  • node: fix ~/.ssh ownership in generate_key_for_user
  • sge: Export $DRMAA_LIBRARY_PATH for Python DRMAA
  • cli: use logging module to log/show exceptions
  • cli: write header to sys.stderr
  • logger: move console logs from stderr to stdout
  • logger: output error messages to stderr
  • logger: remove in-memory log handler (memory-leak fix)
  • fix for DependencyViolation when deleting security group
  • replace group-name filter with instance.group-name
  • spothistory: show current spot price instead of oldest

resizevol/createvol:

  • call e2fsck before resize2fs if needed
  • cleanup vol on failure in resize()
  • fix bug in _warn_about_volume_hosts
  • fix bug in sfdisk repartitioning
  • set default instance type to t1.micro for volume hosts
  • fix bug when cleaning up failed volumes
  • use force_spot_master when launching cluster

ebsimage/s3image:

  • fetch root device from instance metadata
  • only wait for snapshot if root device is in bmap
  • show progress bar for CreateImage snapshot
  • remove ssh host keys in clean_private_data
  • use ec2.get_snapshot to fetch snapshots

loadbalance:

  • collect all queue info from qstat
  • fix logic in SGEStats._count_tasks
  • fix node count in _eval_remove_node
  • get slot calculations from all.q
  • handle qacct error on fresh SGE install
  • improve qstat parsing
  • maintain a single instance of SGEStats
  • move static settings to __init__
  • remove bounds on interval and wait_time
  • remove estimated time to completion calcs
  • replace ‘qlen > ts’ condition in _eval_add_node
  • require positive ints for numeric options
  • take slots into account in _eval_add_node
  • update sge tests for latest parser changes
  • use # of nodes for max_nodes check (not exec hosts)
  • don’t wait for cluster to stabilize to add a node when there are no slots
  • immediately add a node when jobs are in the queue and no slots exists

IPython plugin:

  • Add section on engine restart + note on growing new engines
  • ipcluster: raise exception if IPython isnt installed
  • Better docstrings for the stop and restart plugins
  • Better packer configuration support + updated doc
  • Make it possible to configure a custom notebook folder
  • Make it possible to restart engine, better logs, make msgpack setup explicit
  • Open the controller ports for IPython 0.13.1
  • Use the cluster thread pool to start the engines on the nodes
  • Update the documentation according to tests done on 0.13.1

0.93.3

News

Version 0.93.3 of StarCluster is a patch release that fixes several bugs. See the “Bug Fixes” section below for details.

  • Bumped boto dep to 2.3.0
  • Updated and improved default config template

Bug Fixes

  • sge: fix bug where master could never be an exec host even when toggling the master_is_exec_host plugin setting (issue 89)
  • fix bug when setting custom SSH permission without cidr_ip (issue 91)
  • start-cmd: fix bug where refresh_interval is always None if [global] is not defined in the config (issue 90)
  • dont allow cluster_user=root when validating cluster settings (issue 92)
  • fix another textwrap error with python 2.5

0.93.2

News

Version 0.93.2 of StarCluster fixes several bugs and adds a couple new features. See the “Bug Fixes” section below for details.

  • Amazon recently announced a new instance type m1.medium which is now supported. Amazon also enabled the m1.small and c1.medium instance types to be compatible with both 32bit and 64bit AMIs now. StarCluster now supports the new m1.medium instance type as well as the new changes to m1.small and c1.medium.
  • Switched from paramiko to actively maintained fork called ssh (1.7.13)
  • loadbalancer: --kill-master option renamed to --kill-cluster
  • Bumped boto dep to 2.2.2
  • Bumped jinja dep to 2.6

Features

  • SGE setup code has been moved to a plugin that is executed by default (no configuration changes required). This plugin supports an optional setting master_is_exec_host which controls whether or not the master node executes jobs. This is set to True by default. To change this you will need to set disable_queue=True in your cluster config and then manually define and add the SGE plugin in order to configure the new plugin’s settings. See Sun Grid Engine Plugin for more details.
  • Added the following new options to addnode command: --image-id, --instance-type, --availability-zone, --bid. Thanks to Dan Yamins for his contributions. (issue 76)
  • New CreateUsers plugin that creates multiple cluster users and configures passwordless SSH. Either number of users or a list of usernames can be specified. Also has options to tar and download all SSH keys for generated users. See Create Users Plugin for more details.
  • Added -X option to sshmaster and sshnode commands for forwarding X11. Can be used both when logging in interactively and when executing remote commands.

Bug Fixes

  • catch Garbage packet received exception and warn users about incompatible AMI
  • createvolume: support new /dev/xvd* device names
  • fix textwrap error with python 2.5 (issue 85)
  • sshmaster, sshnode: load /etc/profile by default when executing remote cmds
  • terminate: handle Ctrl-D gracefully when prompting for input (issue 80)
  • terminate: ignore keys when terminating manually
  • validate key_location is readable (issue 77)
  • relax validation constraints for running instances when using starcluster start -x (issue 78)
  • fix cluster validation for t1.micro AMIs/instance types
  • loadbalance: fix _get_stats when no exec hosts
  • loadbalance: respect min_nodes when removing nodes (issue 75)
  • loadbalance: fix -K option to kill cluster when queue is empty (issue 86)
  • loadbalance: retry fetching stats 5x before failing (issue 65)
  • skip NFS exports setup when there are no worker nodes
  • sge: configure SGE profile on all nodes concurrently
  • sge: use qconf’s -*attr commands to update cluster config (replaced a bunch of ugly code that parsed SGE’s console output)
  • add Changelog to MANIFEST.in in order to include Changelog in future release dists

0.93.1

News

Warning

It is recommended that you upgrade as soon as possible.

Version 0.93.1 of StarCluster is largely a patch release that fixes several bugs. See the “Bug Fixes” section below for details.

Also, the put and get commands have been marked stable which means they no longer require ENABLE_EXPERIMENTAL=True in the config in order to use them.

Features

  • Added sys.argv to crash reports for better details when things go wrong
  • Implemented on_add_node and on_remove_node for TMUX plugin to add/remove nodes from the TMUX control-center sessions for both root and CLUSTER_USER
  • Implemented on_add_node for the Ubuntu package installer plugin (pkginstaller) to ensure new nodes also install the required packages
  • Implemented on_add_node for the MPICH2 plugin to ensure new nodes are configured for MPICH2 and also added to the hosts file. Moved the MPICH2 hosts file to an NFS-shared location (/home/mpich2.hosts) for faster configuration
  • Implemented on_add_node for the Xvfb plugin that installs and runs Xvfb on newly added nodes

Bug Fixes

  • Fixed major NFS bug in addnode command where /etc/exports was being wiped each time a new node was added. This caused the last-to-be-added node to be the only node correctly configured for NFS in the entire cluster.
  • Updated all plugins which subclass DefaultClusterSetup to implement on_add_node and on_remove_node regardless of whether or not they actually use them to avoid the ‘default’ plugin’s implementation being incorrectly called multiple times by plugins when adding/removing nodes using the addnode and removenode commands
  • Fixed string formatting bug in Condor plugin

0.93

News

New StarCluster Ubuntu 11.10 EBS AMIs available including a new HVM AMI for use with CC/CC2/CG instance types! Includes OpenGridScheduler (to replace SGE), Condor, Hadoop/Dumbo, MPICH2, OpenMPI, IPython 0.12 (parallel+notebook), and more. Run dpkg -l on the AMIs for the full list of packages.

The new StarCluster Ubuntu 11.10 EBS HVM AMI additionally comes with NVIDIA driver, CUDA Toolkit/SDK, PyCuda, PyOpenCL, and Magma for GPU computing.

These new AMIs are available in all AWS regions. The new us-east-1 11.10 EBS AMIs are now the defaults in the StarCluster config template. Use the listpublic command to obtain a list of available StarCluster AMIs in other regions:

$ starcluster -r sa-east-1 listpublic

Note

The HVM AMI is currently only available in the us-east-1 region until other regions support HVM.

Features

  • Added support for new CC2 instance types

  • Added support for specifying a proxy host to use when connecting to AWS:

    [aws info]
    aws_proxy = your.proxy.com
    aws_proxy_port = 8080
    aws_proxy_user = yourproxyuser
    aws_proxy_pass = yourproxypass
    

    See Using a Proxy Host for more details

  • Updated IPCluster plugin to support IPython 0.12 with parallel and notebook support (SSL+password auth). See IPython Cluster Plugin for more details

  • Keypairs are now validated against EC2 fingerprint which notifies users ahead of time if the file they specified in KEY_LOCATION doesn’t match the EC2 keypair specified

  • Support for Windows (tested on 32bit Windows 7). See Installing on Windows for more details

New Plugins

  • Condor - experimental support for creating a Condor pool on StarCluster. See Condor Plugin for more details
  • Hadoop - support for creating a Hadoop cluster with StarCluster See Hadoop Plugin for more details
  • MPICH2 - configure cluster to use MPICH2 instead of OpenMPI See MPICH2 Plugin for more details
  • TMUX - configures cluster SSH “dashboard” in tmux See TMUX Plugin for more details
  • Package installer - install Ubuntu packages on all nodes concurrently See Package Installer Plugin for more details
  • Xvfb - install and configure an Xvfb session on all nodes (sets DISPLAY=:1) See Xvfb Plugin for more details
  • MySQL - install and configure a MySQL cluster on StarCluster. See MySQL Cluster Plugin for more details

Bug Fixes

  • Fix bug in put command where in random cases the last few bytes of the file weren’t being transferred
  • Raise an error if placement group creation fails
  • Fix bug in terminate command in the case that nodes are already shutting-down when the terminate command is executed
  • Fix bug in s3image when destination S3 bucket already exists
  • Fix known_hosts warnings when ssh’ing from node to node internally
  • Fix trivial logging bug in createvolume command

0.92.1

release-date:11-05-2011

Features

  • Support for splitting the config into an arbitrary set of files:

    [global]
    include=~/.starcluster/awscreds, ~/.starcluster/myconf
    

    See Splitting the Config into Multiple Files for more details

  • createvolume: support naming/tagging newly created volumes:

    $ starcluster createvolume --name mynewvol 30 us-east-1d
    

    See Create and Format a new EBS Volume for more details

  • listvolumes: add support for filtering by tags:

    $ starcluster listvolumes --name mynewvol
    $ starcluster listvolumes --tag mykey=myvalue
    

    See Managing EBS Volumes for more details

  • sshmaster, sshnode, sshinstance: support for running remote commands from command line:

    $ starcluster sshmaster mycluster 'cat /etc/fstab'
    $ starcluster sshnode mycluster node001 'cat /etc/fstab'
    $ starcluster sshinstance i-99999999 'cat /etc/hosts'
    

    See Running Remote Commands from Command Line for more details

Bug Fixes

The following bugs were fixed in this release:

spothistory command

  • add package_data to sdist in order to include the necessary web media and templates needed for the --plot feature. The previous 0.92 version left these out and thus the --plot feature was broken. This should be fixed.
  • fix bug when launching default browser on mac

start command

  • fix bug in option completion when using the start command’s --cluster-template option

terminate command

  • fix bug in terminate cmd when region != us-east-1

listkeypairs command

  • fix bug in list_keypairs when no keys exist

Improvements

  • listinstances: add ‘state_reason’ msg to output if available
  • add system info, Python info, and package versions to crash-report
  • listregions: sort regions by name
  • improved bash/zsh completion support. completion will read from the correct config file, if possible, in the case that the global -c option is specified while completing.
  • separate the timing of cluster setup into time spent on waiting for EC2 instances to come up and time spent configuring the cluster after all instances are up and running. this is useful when profiling StarCluster’s performance on large (100+ node) clusters.

0.92

release-date:10-17-2011

News

Note

Before upgrading please be aware that rc1 does not support clusters created with previous versions and will print a warning along with instructions on how to proceed if it finds old clusters. With that said, it’s best not to upgrade until you’ve terminated any currently running clusters created with an old version. Also, the @sc-masters group is no longer needed and can be removed after upgrading by running “starcluster terminate masters”

You can find the new docs for 0.92rc1 here:

http://star.mit.edu/cluster/docs/0.92rc1/

NOTE: these docs are still very much a work in progress. If anyone is interested in helping to improve the docs, please fork the project on github and submit pull requests. Here’s a guide to get you started:

http://star.mit.edu/cluster/docs/0.92rc1/contribute.html

Please upgrade at your leisure, test the new release candidate, and submit any bug reports preferably on StarCluster’s github issue tracker or this mailing list.

Features

  • Cluster Compute/GPU Instances- Added support for the new Cluster Compute/GPU instance types. Thanks to Fred Rotbart for his contributions

  • EBS-Backed Clusters- Added support for starting/stopping EBS-backed clusters on EC2. The stop command now stops (instead of terminate) an EBS-backed cluster and terminates an S3-backed cluster. Added a terminate command that terminates any cluster

  • Improved performance - using workerpool library (http://pypi.python.org/pypi/StarCluster) to configure nodes concurrently

  • New ebsimage and s3image commands for easily creating new EBS-backed AMIs. Each command supports creating a new AMI from S3 and EBS-backed image hosts respectively. (NOTE: createimage has been renamed to s3image. you can still call createimage but this will go away in future releases)

  • Add/Remove Nodes - added new addnode and removenode commands for adding/removing nodes to a cluster and removing existing nodes from a cluster

  • Restart command - Added new restart command that reboots the cluster and reconfigures the cluster

  • Create Keypairs - Added ability to add/list/remove keypairs

  • Elastic Load Balancing - Support for shrinking/expanding clusters based on Sun Grid Engine queue statistics. This allow the user to start a single-node cluster (or larger) and scale the number of instances needed to meet the current SGE queue load. For example, a single-node cluster can be launched and as the queue load increases new EC2 instances are launched, added to the cluster, used for computation, and then removed when they’re idle. This minimizes the cost of using EC2 for an unknown and on-demand workload. Thanks to Rajat Banerjee (rqbanerjee)

  • Security Group Permissions - Added ability to specify permission settings to be applied automatically to a cluster’s security group after it’s been started

  • Multiple Instance Types - Added support for specifying instance types on a per-node basis. Thanks to Dan Yamins for his contributions

  • Unpartitioned Volumes- StarCluster now supports both partitioned and unpartitioned EBS volumes

  • New Plugin Hooks - Plugins can now play a part when adding or removing a node as well as when restarting or shutting down the entire cluster by implementing the on_remove_node, on_add_node, on_shutdown, and on_reboot methods respectively

  • Added a runplugin command and fixed the run_plugin() method in the development shell. You can now run a plugin in the dev shell like so:

    ~[1]> cm.run_plugin('myplugin', 'mycluster')
    
  • Added support for easily switching regions (without config changes) using a global -z option. For example:

    $ starcluster -r eu-west-1 listclusters
    StarCluster - (http://star.mit.edu/cluster) (v. 0.92rc1)
    Software Tools for Academics and Researchers (STAR)
    Please submit bug reports to starcluster@mit.edu
    
    >>> No clusters found...
    
  • Added new resizevolume command for easily resizing existing EBS volumes

  • Added listregions/listzones commands