StarCluster - Mailing List Archive

Re: docker daemon not found when docker command executed with qsub

From: Xander Dunn <no email>
Date: Mon, 16 Nov 2015 23:36:17 -0800

You’re right, thanks very much!

Submitting the job `qsub -b y -cwd id` produces:
uid=1001(sgeadmin) gid=1001(sgeadmin) groups=1001(sgeadmin),20000

Strangely, however, executing the same command on the same node with ssh yields a different result:
sgeadmin_at_master:~$ ssh node001 id
uid=1001(sgeadmin) gid=1001(sgeadmin) groups=1001(sgeadmin),999(docker)

This explains the discrepancy I’m seeing. Why does qsub get a uid 1001 without docker while ssh gets a uid 1001 with docker?

My first thought to resolve this was to `usermod` the sgeadmin user on my AMI to add the docker group to it, but I realize there is no sgeadmin user on my AMI. It’s created by starcluster on node boot.

How can this be set?

Thanks,
Xander

> On Nov 16, 2015, at 19:26, Rayson Ho <raysonlogin_at_gmail.com> wrote:
>
> Xander,
>
> Can you check whether the Grid Engine job environment has the "docker" group as one of the supplemental groups by submitting a job that runs "id"?
>
> http://man7.org/linux/man-pages/man1/id.1.html <http://man7.org/linux/man-pages/man1/id.1.html>
>
> IIRC, Docker requires the process to be a member of the docker group in order to dial /var/run/docker.sock.
>
> Rayson
>
> ==================================================
> Open Grid Scheduler - The Official Open Source Grid Engine
> http://gridscheduler.sourceforge.net/ <http://gridscheduler.sourceforge.net/>
> http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html <http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html>
>
>
>
>
> On Mon, Nov 16, 2015 at 7:15 PM, Xander Dunn <xander.dunn_at_icloud.com <mailto:xander.dunn_at_icloud.com>> wrote:
> >
> > I have star cluster installed from the develop branch because I need to use c4 instance types, which aren’t in a released version yet. I have open grid scheduler 2011.11 installed on an Ubuntu 14.04 AMI.
> >
> > I have Docker installed in that AMI and the daemon starts on boot. If I manually ssh into my master node or any worker node and execute a Docker command, it works. The Docker daemon is found and the command succeeds. Furthermore, executing any docker command from the master node in the form `ssh node001 docker pull IMAGE` also works correctly.
> >
> > However, those same commands, when executed with qsub, will fail because the running Docker daemon can’t be found:
> > Post IMAGE: dial unix /var/run/docker.sock: permission denied.
> > * Are you trying to connect to a TLS-enabled daemon without TLS?
> > * Is your docker daemon up and running?
> >
> > Example: `qsub -V -b y -cwd docker pull ubuntu:14.04`
> >
> > What’s the difference in the way qsub executes commands that’s causing this?
> >
> > Thanks,
> > Xander
> > _______________________________________________
> > StarCluster mailing list
> > StarCluster_at_mit.edu <mailto:StarCluster_at_mit.edu>
> > http://mailman.mit.edu/mailman/listinfo/starcluster <http://mailman.mit.edu/mailman/listinfo/starcluster>
Received on Tue Nov 17 2015 - 02:36:43 EST
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject