使用 CLONE_NEWNS 标志克隆后挂载文件系统 [英] Mount filesystem after clone with CLONE_NEWNS flag

查看:25
本文介绍了使用 CLONE_NEWNS 标志克隆后挂载文件系统的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

I'm trying to implement the following scenario:

  1. clone() main process with CLONE_NEWNS flag (it means new mount namespace)
  2. mount() new filesystem in child process
  3. child process finished and all created in this process filesystems are unmounted

But it doesn't work as I expected and I still see mounted filesystems in main process. What am I doing wrong?

Sources are here https://github.com/dmitrievanthony/sprat/blob/master/src/container.c#L47

System is default AWS Ubuntu,

ubuntu@ip-172-31-31-112:~/sprat$ uname -a
Linux ip-172-31-31-112 4.4.0-53-generic #74-Ubuntu SMP Fri Dec 2 15:59:10 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

解决方案

Short answer: It looks like the type of mount propagation isn't properly set.


Explanation

The Linux kernel defaults all mounts to MS_PRIVATE, but systemd overrides this during early boot to MS_SHARED, for the convenience of nspawn. This can be observed by looking at the optional fields of /proc/$PID/mountinfo. For instance, something like this might be expected:

$ cat /proc/self/mountinfo
  . . .
25 0 8:6 / / rw,relatime shared:1 - ext4 /dev/sda6 rw,errors=remount-ro,data=ordered
                         ^^^^^^
  . . .

Notice the underlined(by me) shared:1 field above, indicating that the current propagation type of / mount point is MS_SHARED, and the peer group ID is 1 (we won't care about peer group ID at all in our case).

When using the CLONE_NEWNS flag on clone(2) a new mount namespace is created, which is initialized as a copy of the caller's mount namespace. The new, replicated mount points of the new namespace join the same peer group as their respective original mount points in the caller's mount namespace.

The propagation type of a new mount point whose parent's propagation type is MS_SHARED, is MS_SHARED too. Thus, when your "contained" process mount()s the filesystem on the loop device, the mount is by default MS_SHARED. Later, all the mounts under it, are propagated to "main" process's namespace too, and that's the reason "main" process can see them.

For your request to be satisfied (for the "main" process not to see "contained" process's mount points), the mount propagation type you seek is either MS_SLAVE or MS_PRIVATE, depending on whether you want your "contained" process's root mount point to receive propagation events from other peers or not, respectively. Obviously, MS_PRIVATE offers greater isolation than MS_SLAVE.

Thus, in your case, it should be sufficient to change the propagation type of "contained" process's root mount point to MS_PRIVATE or MS_SLAVE before you mount the rest of the filesystems, so the mounts won't be propagated to "main" process's namespace.


The code

At first, one would try to set the propagation type properly when the "contained" process creates its root mount point.

However, I noticed the following in man 8 mount (quoting):

Note that the Linux kernel does not allow to change multiple propagation flags with a single mount(2) system call, and the flags cannot be mixed with other mount options.

Since util-linux 2.23 the mount command allows to use several propagation flags together and also together with other mount operations. This feature is EXPERIMENTAL. The propagation flags are applied by additional mount(2) system calls when the preceding mount operations were successful.

Looking at your code, the "contained" process, after it mount()s the filesystem on the loop device, it issues chroot() to it. At this point, you could set its propagation type by injecting this mount(2) call:

if (chroot(".") < 0) {
    // handle error
}

if (mount("/", "/", c->fstype, MS_PRIVATE, "") < 0) {
    // handle error
}

if (mkdir(...)) {
    // handle error
}

Now that the propagation type is set to MS_PRIVATE, all the subsequent mounts that "contained" process does under / won't be propagated, thus won't be visible in "main" process's namespace, as you can observe in /proc/mounts or /proc/$PID/mountinfo.


Resources

这篇关于使用 CLONE_NEWNS 标志克隆后挂载文件系统的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆