克隆后使用CLONE_NEWNS标志挂载文件系统 [英] Mount filesystem after clone with CLONE_NEWNS flag

查看:389
本文介绍了克隆后使用CLONE_NEWNS标志挂载文件系统的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试实现以下方案:

I'm trying to implement the following scenario:

  1. clone()主进程,带有CLONE_NEWNS标志(表示新的安装名称空间)
  2. mount()子进程中的新文件系统
  3. 子进程完成,并且在该进程中创建的所有文件系统均已卸载
  1. clone() main process with CLONE_NEWNS flag (it means new mount namespace)
  2. mount() new filesystem in child process
  3. child process finished and all created in this process filesystems are unmounted

但是它没有按我预期的那样工作,我仍然在主进程中看到已挂载的文件系统.我在做什么错了?

But it doesn't work as I expected and I still see mounted filesystems in main process. What am I doing wrong?

来源在这里 https://github.com/dmitrievanthony/sprat/blob/master/src/container.c#L47

系统是默认的AWS Ubuntu

System is default AWS Ubuntu,

ubuntu@ip-172-31-31-112:~/sprat$ uname -a
Linux ip-172-31-31-112 4.4.0-53-generic #74-Ubuntu SMP Fri Dec 2 15:59:10 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

推荐答案

简短答案:看起来是

Short answer: It looks like the type of mount propagation isn't properly set.

Linux内核默认将所有挂载设置为MS_PRIVATE,但是systemd 在早期启动期间将其覆盖MS_SHARED ,为方便nspawn. 通过查看/proc/$PID/mountinfo可选字段可以观察到这一点. 例如,可能会出现类似这样的情况:

The Linux kernel defaults all mounts to MS_PRIVATE, but systemd overrides this during early boot to MS_SHARED, for the convenience of nspawn. This can be observed by looking at the optional fields of /proc/$PID/mountinfo. For instance, something like this might be expected:

$ cat /proc/self/mountinfo
  . . .
25 0 8:6 / / rw,relatime shared:1 - ext4 /dev/sda6 rw,errors=remount-ro,data=ordered
                         ^^^^^^
  . . .

请注意上面带下划线的(由我本人)shared:1字段,指示/安装点的当前传播类型为MS_SHARED,并且对等组 ID为1(在这种情况下,我们根本不会在乎对等组ID.

Notice the underlined(by me) shared:1 field above, indicating that the current propagation type of / mount point is MS_SHARED, and the peer group ID is 1 (we won't care about peer group ID at all in our case).

clone(2)上使用CLONE_NEWNS标志时,将创建一个新的安装命名空间,并将其初始化为调用方的安装命名空间的副本. 新名称空间的新复制挂载点与调用方的挂载命名空间中各自的原始挂载点加入相同的对等组.

When using the CLONE_NEWNS flag on clone(2) a new mount namespace is created, which is initialized as a copy of the caller's mount namespace. The new, replicated mount points of the new namespace join the same peer group as their respective original mount points in the caller's mount namespace.

其父节点的传播类型为MS_SHARED的新安装点的传播类型也为MS_SHARED.因此,当您包含"进程mount()在循环设备上的文件系统时,默认情况下挂载为MS_SHARED.后来,它下面的所有挂载也都传播到主"进程的名称空间,这就是主"进程可以看到它们的原因.

The propagation type of a new mount point whose parent's propagation type is MS_SHARED, is MS_SHARED too. Thus, when your "contained" process mount()s the filesystem on the loop device, the mount is by default MS_SHARED. Later, all the mounts under it, are propagated to "main" process's namespace too, and that's the reason "main" process can see them.

为了满足您的请求(对于主"进程不查看包含"进程的挂载点),您寻求的挂载传播类型为MS_SLAVEMS_PRIVATE,具体取决于您是否希望包含"进程的根挂载点,以分别从其他对等点接收传播事件. 显然,MS_PRIVATEMS_SLAVE具有更大的隔离度.

For your request to be satisfied (for the "main" process not to see "contained" process's mount points), the mount propagation type you seek is either MS_SLAVE or MS_PRIVATE, depending on whether you want your "contained" process's root mount point to receive propagation events from other peers or not, respectively. Obviously, MS_PRIVATE offers greater isolation than MS_SLAVE.

因此,对于您而言,在挂载其余文件系统之前,将包含"进程的根挂载点的传播类型更改为MS_PRIVATEMS_SLAVE 应该足够了,因此,挂载不会传播到主"进程的名称空间.

Thus, in your case, it should be sufficient to change the propagation type of "contained" process's root mount point to MS_PRIVATE or MS_SLAVE before you mount the rest of the filesystems, so the mounts won't be propagated to "main" process's namespace.

首先,当包含"进程创建根安装点时,人们会尝试正确设置传播类型.

At first, one would try to set the propagation type properly when the "contained" process creates its root mount point.

但是,我在 man 8 mount (引用):

However, I noticed the following in man 8 mount (quoting):

请注意,Linux内核不允许更改多个 带有单个mount(2)系统调用的传播标志,以及这些标志 不能与其他安装选项混合使用.

Note that the Linux kernel does not allow to change multiple propagation flags with a single mount(2) system call, and the flags cannot be mixed with other mount options.

由于util-linux 2.23,mount命令允许使用多个 一起传播标志,也与其他安装标志一起传播 操作.此功能是实验性的.传播标志是 在先前的挂载时由附加的mount(2)系统调用应用 手术成功了.

Since util-linux 2.23 the mount command allows to use several propagation flags together and also together with other mount operations. This feature is EXPERIMENTAL. The propagation flags are applied by additional mount(2) system calls when the preceding mount operations were successful.

查看您的代码,即包含"进程,在循环设备上将文件系统c mount()后,向其发出chroot().此时,您可以通过注入以下mount(2)调用来设置其传播类型:

Looking at your code, the "contained" process, after it mount()s the filesystem on the loop device, it issues chroot() to it. At this point, you could set its propagation type by injecting this mount(2) call:

if (chroot(".") < 0) {
    // handle error
}

if (mount("/", "/", c->fstype, MS_PRIVATE, "") < 0) {
    // handle error
}

if (mkdir(...)) {
    // handle error
}

现在,将传播类型设置为MS_PRIVATE,将不会传播包含"进程在/下执行的所有后续装载,因此在主"进程的名称空间中将不可见可以在/proc/mounts/proc/$PID/mountinfo中进行观察.

Now that the propagation type is set to MS_PRIVATE, all the subsequent mounts that "contained" process does under / won't be propagated, thus won't be visible in "main" process's namespace, as you can observe in /proc/mounts or /proc/$PID/mountinfo.

  • Linux kernel's Shared Subtrees documentation for more information on mount propagation.

Michael Kerrisk出色的 LWN文章比我能更好地解释了挂载名称空间. p>

Michael Kerrisk's excellent LWN article explaining mount namespaces better than I could.

这篇关于克隆后使用CLONE_NEWNS标志挂载文件系统的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆