StarCluster - Mailing List Archive

Re: Possible NFS setup error when adding new nodes to a cluster?

From: Paul Koerbitz <no email>
Date: Thu, 19 Jan 2012 10:33:17 +0100

Hello Justin,

I checked out the develop branch from the link you send me and can confirm
this is fixed. I started a cluster with one master node, then added one
node and another node. /etc/exports is not clobbered anymore and everything
NFS-wise seems to work.

cheers
Paul

Here is how the /etc/exports file looked like:

root_at_master:/data# cat /etc/exports
# /etc/exports: the access control list for filesystems which may be
exported
# to NFS clients. See exports(5).
#
# Example for NFSv2 and NFSv3:
# /srv/homes hostname1(rw,sync,no_subtree_check)
hostname2(ro,sync,no_subtree_check)
#
# Example for NFSv4:
# /srv/nfs4 gss/krb5i(rw,sync,fsid=0,crossmnt,no_subtree_check)
# /srv/nfs4/homes gss/krb5i(rw,sync,no_subtree_check)
#

root_at_master:/data# cat /etc/exports
# /etc/exports: the access control list for filesystems which may be
exported
# to NFS clients. See exports(5).
#
# Example for NFSv2 and NFSv3:
# /srv/homes hostname1(rw,sync,no_subtree_check)
hostname2(ro,sync,no_subtree_check)
#
# Example for NFSv4:
# /srv/nfs4 gss/krb5i(rw,sync,fsid=0,crossmnt,no_subtree_check)
# /srv/nfs4/homes gss/krb5i(rw,sync,no_subtree_check)
#
/home node001(async,no_root_squash,no_subtree_check,rw)
/opt/sge6 node001(async,no_root_squash,no_subtree_check,rw)
/data node001(async,no_root_squash,no_subtree_check,rw)

root_at_master:/data# cat /etc/exports
# /etc/exports: the access control list for filesystems which may be
exported
# to NFS clients. See exports(5).
#
# Example for NFSv2 and NFSv3:
# /srv/homes hostname1(rw,sync,no_subtree_check)
hostname2(ro,sync,no_subtree_check)
#
# Example for NFSv4:
# /srv/nfs4 gss/krb5i(rw,sync,fsid=0,crossmnt,no_subtree_check)
# /srv/nfs4/homes gss/krb5i(rw,sync,no_subtree_check)
#
/home node001(async,no_root_squash,no_subtree_check,rw)
/opt/sge6 node001(async,no_root_squash,no_subtree_check,rw)
/data node001(async,no_root_squash,no_subtree_check,rw)
/home node002(async,no_root_squash,no_subtree_check,rw)
/opt/sge6 node002(async,no_root_squash,no_subtree_check,rw)
/data node002(async,no_root_squash,no_subtree_check,rw)

On Wed, Jan 18, 2012 at 23:45, Paul Koerbitz <paul.koerbitz_at_gmail.com>wrote:

> Hi Justin,
>
> ok great. I have something running right now that I don't want to
> interrupt, but I might be able to take a stab at it tomorrow and will
> report back then.
>
> cheers
> Paul
>
> On Wed, Jan 18, 2012 at 23:17, Justin Riley <jtriley_at_mit.edu> wrote:
>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Hi Paul,
>>
>> No problem at all and thanks for the kind words. From my limited
>> testing I believe this is fixed in the latest github code which will
>> be included in tomorrow's patch release:
>>
>> http://tinyurl.com/8axmckc
>>
>> If you could test the latest github code and report back whether it
>> fixes the issue for you or not that'd be very helpful.
>>
>> ~Justin
>>
>> On 01/18/2012 03:44 PM, Paul Koerbitz wrote:
>> > Hi Justin,
>> >
>> > thanks for the fast response and the great work. I thought about
>> > taking a crack at a fix myself, but Im not familiar with the
>> > codebase and don't have very little time right now.
>> >
>> > thanks Paul
>> >
>> > On Wed, Jan 18, 2012 at 21:33, Justin Riley <jtriley_at_mit.edu
>> > <mailto:jtriley_at_mit.edu>> wrote:
>> >
>> > Hi Paul,
>> >
>> > I just tested for myself and I can confirm that /etc/exports is
>> > indeed being clobbered when running the 'addnode' command. I'm
>> > working on a patch release to fix this and other minor things.
>> > Should be out tomorrow.
>> >
>> > Thanks for reporting!
>> >
>> > ~Justin
>> >
>> > On 01/18/2012 02:08 PM, Paul Koerbitz wrote:
>> >> Dear starcluster team,
>> >
>> >> I tripped over what might be an error with the NFS setup when
>> >> adding new nodes to a cluster.
>> >
>> >> I set up my cluster with initially one root node only and then
>> >> first added one node and subsequently 4 more nodes. I noticed
>> >> that my ebsvolume wasn't getting mounted correctly on the nodes,
>> >> calling 'df' reported 'stale filehandle' for /home /opt/sge6 and
>> >> /data
>> >
>> >> My impression is that as nodes get added, the /etc/exports file
>> >> which is responsible for allowing NFS access gets overwritten.
>> >> Therefore only the last added node can access the shared file
>> >> systems.
>> >
>> >> Here is how I resloved the issue. First I unmounted all the
>> >> volumes:
>> >
>> >> root_at_node001:~# umount -f /data
>> >
>> >> At this point remounting doesn't work:
>> >
>> >> root_at_node001:~# mount -t nfs master:/data /data
>> >
>> >> mount.nfs: access denied by server while mounting master:/data
>> >
>> >
>> >> I then edited /etc/exports on the master node. Here only the
>> >> last node was listed:
>> >
>> >> /home node005(async,no_root_squash,no_subtree_check,rw)
>> >> /opt/sge6 node005(async,no_root_squash,no_subtree_check,rw)
>> >> /data node005(async,no_root_squash,no_subtree_check,rw)
>> >
>> >> I changed this to /home
>> >> node001(async,no_root_squash,no_subtree_check,rw) /opt/sge6
>> >> node001(async,no_root_squash,no_subtree_check,rw) /data
>> >> node001(async,no_root_squash,no_subtree_check,rw) /home
>> >> node002(async,no_root_squash,no_subtree_check,rw) /opt/sge6
>> >> node002(async,no_root_squash,no_subtree_check,rw) /data
>> >> node002(async,no_root_squash,no_subtree_check,rw) /home
>> >> node003(async,no_root_squash,no_subtree_check,rw) /opt/sge6
>> >> node003(async,no_root_squash,no_subtree_check,rw) /data
>> >> node003(async,no_root_squash,no_subtree_check,rw) /home
>> >> node004(async,no_root_squash,no_subtree_check,rw) /opt/sge6
>> >> node004(async,no_root_squash,no_subtree_check,rw) /data
>> >> node004(async,no_root_squash,no_subtree_check,rw) /home
>> >> node005(async,no_root_squash,no_subtree_check,rw) /opt/sge6
>> >> node005(async,no_root_squash,no_subtree_check,rw) /data
>> >> node005(async,no_root_squash,no_subtree_check,rw)
>> >
>> >> then restart the nfs-server:
>> >
>> >> $ /etc/init.d/nfs-kernel-server restart
>> >
>> >> After that running 'df' on each node showed the nfs now working
>> >> correctly.
>> >
>> >> kind regards Paul
>> >
>> >
>> >> This body part will be downloaded on demand.
>> >
>> >
>> >
>>
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v2.0.17 (GNU/Linux)
>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>>
>> iEYEARECAAYFAk8XRPwACgkQ4llAkMfDcrlJWACgjNwy6KVMywbiP6aVggOgQVqm
>> OD8AnA/1fwt04oGIhEtA7i3kq8KLMr0y
>> =9mnL
>> -----END PGP SIGNATURE-----
>> _______________________________________________
>> StarCluster mailing list
>> StarCluster_at_mit.edu
>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>
>
>
Received on Thu Jan 19 2012 - 04:33:50 EST
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject