StarCluster - Mailing List Archive

Re: StarCluster base AMI converted to C5 instance type

From: Vasisht Tadigotla <no email>
Date: Thu, 19 Apr 2018 08:51:22 -0700

Hi Sergio,

The _get_volume_device function in volume.py will also need to be modified.
There seem to be utilities (name id-ctrl) that can map the NVMe device to a
volume id. I don’t know if doing this upfront on the base image will
resolve the mount issues without having to modify the underlying code.

Cheers,
Vasisht


On April 19, 2018 at 8:25:26 AM, Sergio Mafra (sergiohmafra_at_gmail.com)
wrote:

Hi folks,

Unfortunately it seems that this will require a change in the code of
StarCluster to comply with the new C5 instances and its new devices.
If you take a look to clustersetup.py, you can notice that some job must be
done.

All best,

Sergio

2018-04-18 20:28 GMT-03:00 Lyn Gerner <schedulerqueen_at_gmail.com>:

> Sergio, sounds like you're getting close. /dev/xvdz is the real device
> behind the link to it that some Linux variants create, which would be (the
> perhaps more familiar to starcluster users) /dev/sdz. You will need to
> update any reference to it from /dev/xvdz to /dev/nvme25n01. (In NVMe land,
> "a" maps to 0 (zero), b maps to 1, ... , z maps to 25.)
>
>
>
> On Wed, Apr 18, 2018 at 2:28 PM, Colby Taperts <
> colby.taperts_at_codewilling.com> wrote:
>
>> Hi Sergio,
>>
>> I would check the `~/.starcluster/config` file looks like you copied over
>> an older config and have the leftover volumes in there still.
>>
>> Also, see http://star.mit.edu/cluster/docs/0.93.3/manual/configuration
>> .html it may help you out
>>
>> Good luck,
>> Colby
>>
>> On Wed, Apr 18, 2018 at 4:19 PM Sergio Mafra <sergiohmafra_at_gmail.com>
>> wrote:
>>
>>> Hi folks..
>>>
>>> Good news.. I´ve managed to resolve the previous issue with SGEPlugin
>>> just by deleting /opt/sge6 in the AMI base.
>>>
>>> Now the problem is the following:
>>>
>>> *** WARNING - Cannot find device /dev/xvdz for volume
>>> vol-0d8792d3f9ae70b7a
>>> *** WARNING - Not mounting vol-0d8792d3f9ae70b7a on /home
>>> *** WARNING - This usually means there was a problem attaching the EBS
>>> volume to the master node
>>>
>>> Starcluster is looking for old names for EBS.. how to manage that in the
>>> C5 instance?
>>>
>>> All best,
>>>
>>> Sergio
>>>
>>> 2018-04-18 8:54 GMT-03:00 Sergio Mafra <sergiohmafra_at_gmail.com>:
>>>
>>>> Hi Teddy,
>>>>
>>>> It´s really odd. I´ve got two base AMI for StarCluster.. one for Ubuntu
>>>> 14.04 and other for Ubuntu 16.04.
>>>> The oldest one (14.04) has been converted to ENA with no problems, but
>>>> when I tried to provision it with StarCluster, it gave a old error of
>>>> SGEpluging:
>>>> !!! ERROR - Error occured while running plugin
>>>> 'starcluster.plugins.sge.SGEPlugin':
>>>> !!! ERROR - remote command 'source /etc/profile && cd /opt/sge6 &&
>>>> !!! ERROR - TERM=rxvt ./inst_sge_sc -x -noremote -auto ./ec2_sge.conf'
>>>> !!! ERROR - failed with status 1:
>>>> !!! ERROR - Reading configuration from file ./ec2_sge.conf
>>>> !!! ERROR - [H[2J
>>>> The 16.04 seems not to be ENA converted... and became unreachable.
>>>>
>>>> I don´t have a AWS support...:(
>>>>
>>>> All best,
>>>>
>>>> Sergio
>>>>
>>>> 2018-04-17 19:36 GMT-03:00 Teddy Thomas <tjthomas292_at_gmail.com>:
>>>>
>>>>> Hi Sergio-
>>>>>
>>>>> I looked in the GitHub repo's issue for the Amazon drivers, and found
>>>>> someone having a similar issue, though with Debian instead of Ubuntu:
>>>>> https://github.com/amzn/amzn-drivers/issues/63. It's possible there's
>>>>> an issue with the driver, or something in the AMI. In the issue, the
>>>>> recommended reaching out to AWS. Have you reached out to AWS Support yet?
>>>>> If you do end up finding out the problem, or get this working, I'd be
>>>>> curious to know about it. I hope that's a pointer in the right direction at
>>>>> least, and sorry I'm not more help.
>>>>>
>>>>> -Teddy
>>>>>
>>>>> On Mon, Apr 16, 2018 at 12:55 PM Sergio Mafra <sergiohmafra_at_gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi fellows,
>>>>>>
>>>>>> I´ve tried to prepare the StarCluster AMI Ubuntu 16.04 Public -
>>>>>> ami-040b6113 to be compatible iuth ENA and so, able to be provisioned as C5
>>>>>> instance type.
>>>>>>
>>>>>> I´ve followed this steps: https://docs.aws.amazon
>>>>>> .com/AWSEC2/latest/UserGuide/enhanced-networking-ena.html#
>>>>>> enhanced-networking-ena-ubuntu.
>>>>>>
>>>>>> After rebooting, the instance could not be reached as provisioned as
>>>>>> C5 type.
>>>>>>
>>>>>> Does anyone has made progress on this...
>>>>>>
>>>>>> All best,
>>>>>>
>>>>>> Sergio Mafra
>>>>>> _______________________________________________
>>>>>> StarCluster mailing list
>>>>>> StarCluster_at_mit.edu
>>>>>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>>>>>
>>>>> --
>>>>> Sent from my iPhone
>>>>>
>>>>
>>>>
>>> _______________________________________________
>>> StarCluster mailing list
>>> StarCluster_at_mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>>
>>
>> _______________________________________________
>> StarCluster mailing list
>> StarCluster_at_mit.edu
>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>
>>
>
_______________________________________________
StarCluster mailing list
StarCluster_at_mit.edu
http://mailman.mit.edu/mailman/listinfo/starcluster
Received on Thu Apr 19 2018 - 11:51:29 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject