StarCluster - Mailing List Archive

Re: Issues with using an EBS volume

From: Jacob Barhak <no email>
Date: Mon, 29 Jul 2013 00:59:21 -0500

Thanks Rayon,

I am amazed by the speed and your responsiveness this entire week.

I am just reporting the issues so others who encounter them may resolve them or avoid them - in the spirit of open source. From this perspective our conservation is very helpful to others and may give the developers a better picture of what their customers are experiencing. Therefore the long text here.

I am trying to think about the simplest solution that would be suitable for me in the short and longer term.

Chances are that the 0.94 version easy install issue will be fixed quickly, so there will be no need for the user to mess with the code. Even though starcluster is open source, it is generally bad idea to let the users change code. It is the last resort - never the less, now the intelligent users knows what to do - thanks to your explanation.

Following the same line of thought, I wish to release code that requires as little effort from the user and work for longer time without need to release a new version.

Therefore my solution priorities would be:
1. Wait for the fix of 0.94 easy install and use AWS. This is your interest since otherwise you may loose users who would not be able to install the tool and abandon it.
2. As a backup solution, test the NFS share plugin that if works well avoids the use of AWS altogether and therefore has advantages - even with 0.94.
3. If I am pressed to the wall time wise - which I am not - implement your suggested code change myself and roll this in the instructions of my code when I release my code.

Your team should be proud of giving excellent support and you can certainly sleep well at night knowing you are doing a great job on an excellent tool.

        Jacob


Sent from my iPhone

On Jul 28, 2013, at 11:48 PM, Rayson Ho <raysonlogin_at_gmail.com> wrote:

> (Fixed some typos in my previous email... I should have gone to bed
> hours before but still up as I needed to finish something else...
> excuse my typos! :-D )
>
> Hi,
>
> Some background info: The EBS mounting bug only affects launching from
> Windows, which is what you are running...
>
> To fix the bug without upgrading to 0.94, you can merge the fix by
> hand into your local SC 0.93.3 installation:
>
> https://github.com/jtriley/StarCluster/commit/64fa4b8fac7318c7a77d9d769a5826fa76e62341
>
> It is a 2-line fix, basically changing "os.path.join(path, f)" to
> "posixpath.join(path, f)". The reason this is needed is because
> os.path.join() uses the local OS's path separator to join the path,
> and on Windows it is "\", so the result becomes "/dev\sdz" instead of
> "/dev/sdz". Launching on Unix or Linux it is not an issue because the
> local & remote (EC2 side) path separators are both "/".
>
> It is a safe and simple fix (just 1 file:
> starcluster/sshutils/__init__.py), and should fix the issue you are
> encountering... Let me know if you encounter issues with merging that
> fix.
>
> Rayson
>
> ==================================================
> Open Grid Scheduler - The Official Open Source Grid Engine
> http://gridscheduler.sourceforge.net/
>
>
> On Mon, Jul 29, 2013 at 12:25 AM, Jacob Barhak <jacob.barhak_at_gmail.com> wrote:
>> Thanks Rayson,
>>
>> If this is the case, then there is no quick solution to the disk space issue
>> that you helped identify this last week.
>>
>> I cannot install version 0.94 since easy install does not work well in its
>> current form. See the following link for a full report:
>> http://star.mit.edu/cluster/mlarchives/1798.html
>>
>> Alternatively, the solution you suggested of exporting the volumes from the
>> master to share these with the nodes would help, yet it involves many
>> instructions made manually after the cluster is up and running. So as it is,
>> it is too cumbersome to implement and for sure hard to document so someone
>> inexperienced can follow easily.
>>
>> The quickest least painful solution that may be open for me now would be to
>> try the plugin pointed to by scrappythekangaroo about 3 months ago in:
>> https://github.com/jtriley/StarCluster/issues/44
>>
>> So unless you can fix the 0.94 installation quick, I will have to try the
>> plugin solution in hope it fully resolves the issue.
>>
>> I hope I can reach an easy and quick solution for increasing the available
>> disk space.
>>
>> Jacob
>>
>>
>> Sent from my iPhone
>>
>> On Jul 28, 2013, at 10:04 PM, Rayson Ho <raysonlogin_at_gmail.com> wrote:
>>
>> I believe you are hitting this bug:
>>
>> https://github.com/jtriley/StarCluster/pull/147
>>
>> And I verified that 0.93.3 does not have the fix, and 0.94 or the dev
>> version should be fine.
>>
>> Rayson
>>
>> ==================================================
>> Open Grid Scheduler - The Official Open Source Grid Engine
>> http://gridscheduler.sourceforge.net/
>>
>>
>> On Sun, Jul 28, 2013 at 3:58 AM, Jacob Barhak <jacob.barhak_at_gmail.com>
>> wrote:
>>
>> Hello,
>>
>>
>> Perhaps someone in the group can help out with using an EBS volume.
>>
>>
>> I created an EBS volume and want to launch a cluster that uses it. I am
>>
>> doing this in an attempt to solve the disk limitation problem I encountered
>>
>> and is reported in this list at:
>>
>> http://star.mit.edu/cluster/mlarchives/1795.html
>>
>>
>> However, I encounter the following error during starting the cluster.
>>
>>
>> Setting up the cluster...
>>
>> Attaching volume vol-f0ae61cb to master node on /dev/sdz ...
>>
>> Configuring hostnames...
>>
>> 2/2 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
>>
>> 100%
>>
>> !!! ERROR - volume has more than one partition, please specify which
>>
>> partition t
>>
>> o use (e.g. partition=0, partition=1, etc.) in the volume's config
>>
>>
>> The full transcript is attached.
>>
>>
>> I have the following lines in my configuration file:
>>
>>
>> VOLUMES = mydata
>>
>> ...
>>
>> [volume mydata]
>>
>> VOLUME_ID = vol-f0ae61cb
>>
>> MOUNT_PATH = /mydata
>>
>>
>> I tried adding PARTITION = 0, and PARTITION = 1 to the volume definition in
>>
>> the configuration file, yet nothing seems to fix this.
>>
>>
>> I also tried using "starcluster createvolume" to create the volume yet I
>>
>> encountered the same issue as above in whatever method I created the 20gb
>>
>> volume.
>>
>>
>> I am using ami-a4d64194 for my node images. My configuration and plugin are
>>
>> derived from the files in https://github.com/ContinuumIO/anaconda-ec2
>>
>>
>> I am operating in us-west-2. I am using starcluster 0.93.3 with windows 7
>>
>>
>> If anyone has a quick solution or diagnosis test, I will appreciate the
>>
>> feedback.
>>
>>
>> Jacob
>>
>>
>>
>>
>>
>> _______________________________________________
>>
>> StarCluster mailing list
>>
>> StarCluster_at_mit.edu
>>
>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>
>>
Received on Mon Jul 29 2013 - 01:59:37 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject