与成功的madvise(DONTDUMP)相同的ptr/size上的madvise(DODUMP)因EINVAL失败 [英] madvise(DODUMP) on the same ptr/size as a successful madvise(DONTDUMP) fails with EINVAL

查看:133
本文介绍了与成功的madvise(DONTDUMP)相同的ptr/size上的madvise(DODUMP)因EINVAL失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在启动时测试mysqld(10.3分支)的MariaDB异常是怎么做的:

Testing a MariaDB anomaly of mysqld (10.3 branch) what it does is on startup:

内存分配为bytes=2097152返回ptr=0x7fffe1a00000

在执行madvise syscall之前,/proc/{pid}/smap条目为:

Before the madvise syscall, the /proc/{pid}/smap entry is:

7fffe1a00000-7fffe1c00000 rw-s 00000000 00:0f 18481215                   /SYSV00000000 (deleted)
Size:               2048 kB
KernelPageSize:     2048 kB
MMUPageSize:        2048 kB
Rss:                   0 kB
Pss:                   0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
VmFlags: rd wr sh mr mw me ms de ht sd 

通话后:

madvise(ptr, bytes, MADV_DONTDUMP)

页面按预期提取了dd请勿转储"标志:

The page picks up the dd "don't dump" flags as expected:

7fffe1a00000-7fffe1c00000 rw-s 00000000 00:0f 18481215                   /SYSV00000000 (deleted)
Size:               2048 kB
KernelPageSize:     2048 kB
MMUPageSize:        2048 kB
Rss:                   0 kB
Pss:                   0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
VmFlags: rd wr sh mr mw me ms de ht dd sd 

一段时间后不久,在madvise(ptr, m_size, MADV_DODUMP)之前,地图是相同的:

sometime later just before madvise(ptr, m_size, MADV_DODUMP) the map is the same:

7fffe1a00000-7fffe1c00000 rw-s 00000000 00:0f 18481215                   /SYSV00000000 (deleted)
Size:               2048 kB
KernelPageSize:     2048 kB
MMUPageSize:        2048 kB
Rss:                   0 kB
Pss:                   0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
VmFlags: rd wr sh mr mw me ms de ht dd sd 

下一个代码是:

madvise(ptr, m_size, MADV_DODUMP)

GDB显示使用相同的值:

GDB shows the same values are used:

(gdb) p size
$1 = 2097152
(gdb) p ptr
$2 = (void *) 0x7fffe1a00000

madvise(ptr,size,MADV_DODUMP)是返回-1,errno=EINVAL,并且页面映射保持不变.

madvise(ptr,size,MADV_DODUMP) is returns -1, errno=EINVAL, and the page map remains the same.

内核版本:

$ uname -a
Linux 4.18.9-300.fc29.x86_64 #1 SMP Thu Sep 20 02:32:53 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

为完整起见,从同一程序的EINVAL分配中提取strace -fe trace=%memory ...(执行不同):

For completeness, a strace -fe trace=%memory ... extract from allocation to EINVAL of the same program (different execution):

[pid  6036] shmat(18874431, NULL, 0)    = 0x7f6ebda00000
[pid  6036] madvise(0x7f6ebda00000, 2097152, MADV_DONTDUMP) = 0
[pid  6036] mmap(NULL, 2215936, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6ebd7e3000
[pid  6036] brk(NULL)                   = 0x55caa0d76000
[pid  6036] brk(0x55caa0de7000)         = 0x55caa0de7000
[pid  6036] brk(NULL)                   = 0x55caa0de7000
[pid  6036] brk(0x55caa0e38000)         = 0x55caa0e38000
[pid  6036] brk(NULL)                   = 0x55caa0e38000
[pid  6036] brk(0x55caa0e8a000)         = 0x55caa0e8a000
[pid  6036] mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f6ebcfe2000
[pid  6036] mprotect(0x7f6ebcfe3000, 8388608, PROT_READ|PROT_WRITE) = 0
strace: Process 6039 attached
[pid  6036] mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f6ebc7e1000
[pid  6036] mprotect(0x7f6ebc7e2000, 8388608, PROT_READ|PROT_WRITE) = 0
strace: Process 6040 attached
[pid  6036] mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f6ead3ff000
[pid  6036] mprotect(0x7f6ead400000, 8388608, PROT_READ|PROT_WRITE) = 0
strace: Process 6041 attached
[pid  6036] mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f6eacbfe000
[pid  6036] mprotect(0x7f6eacbff000, 8388608, PROT_READ|PROT_WRITE) = 0
strace: Process 6042 attached
[pid  6036] mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f6eac3fd000
[pid  6036] mprotect(0x7f6eac3fe000, 8388608, PROT_READ|PROT_WRITE) = 0
strace: Process 6043 attached
[pid  6036] mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f6eabbfc000
[pid  6036] mprotect(0x7f6eabbfd000, 8388608, PROT_READ|PROT_WRITE) = 0
strace: Process 6044 attached
[pid  6036] mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f6eab3fb000
[pid  6036] mprotect(0x7f6eab3fc000, 8388608, PROT_READ|PROT_WRITE) = 0
strace: Process 6045 attached
[pid  6036] mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f6eaabfa000
[pid  6036] mprotect(0x7f6eaabfb000, 8388608, PROT_READ|PROT_WRITE) = 0
strace: Process 6046 attached
[pid  6036] mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f6eaa3f9000
[pid  6036] mprotect(0x7f6eaa3fa000, 8388608, PROT_READ|PROT_WRITE) = 0
strace: Process 6047 attached
[pid  6036] mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f6ea9bf8000
[pid  6036] mprotect(0x7f6ea9bf9000, 8388608, PROT_READ|PROT_WRITE) = 0
strace: Process 6048 attached
[pid  6036] mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f6ea93f7000
[pid  6036] mprotect(0x7f6ea93f8000, 8388608, PROT_READ|PROT_WRITE) = 0
strace: Process 6049 attached
[pid  6049] mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f6ea13f7000
[pid  6049] munmap(0x7f6ea13f7000, 46174208) = 0
[pid  6049] munmap(0x7f6ea8000000, 20934656) = 0
[pid  6049] mprotect(0x7f6ea4000000, 135168, PROT_READ|PROT_WRITE) = 0
[pid  6036] brk(NULL)                   = 0x55caa0e8a000
[pid  6036] brk(0x55caa0eab000)         = 0x55caa0eab000
[pid  6036] mmap(NULL, 2117632, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6ebc5dc000
[pid  6036] munmap(0x7f6ebd7e3000, 2215936) = 0
[pid  6036] brk(NULL)                   = 0x55caa0eab000
[pid  6036] brk(0x55caa10d5000)         = 0x55caa10d5000
[pid  6036] brk(NULL)                   = 0x55caa10d5000
[pid  6036] brk(0x55caa1118000)         = 0x55caa1118000
[pid  6036] brk(NULL)                   = 0x55caa1118000
[pid  6036] brk(0x55caa115c000)         = 0x55caa115c000
[pid  6036] madvise(0x7f6ebda00000, 2097152, MADV_DODUMP) = -1 EINVAL (Invalid argument)

关于为什么要为madvise(MADV_DODUMP)返回EINVAL的任何线索?

Any clues as to why the EINVAL is returned for madvise(MADV_DODUMP)?

代码是:mariadb-10.3分支

code is: mariadb-10.3 branch

推荐答案

de引用VM_DONTEXPAND,内核明确拒绝MADV_DODUMP的该标志:

de refers to VM_DONTEXPAND, and the kernel explicitly rejects that flag for MADV_DODUMP:

#define VM_SPECIAL (VM_IO | VM_DONTEXPAND | VM_PFNMAP | VM_MIXEDMAP)
…
    case MADV_DODUMP:
            if (new_flags & VM_SPECIAL) {
                    error = -EINVAL;
                    goto out;
            }
            new_flags &= ~VM_DONTDUMP;
            break;

自2012年提交0103bd16fb90bc741c7a03fd1ea4e8a505abad23("mm:准备VM_DONTDUMP在驱动程序中使用")以来,就一直存在此检查.

This check has been present since commit 0103bd16fb90bc741c7a03fd1ea4e8a505abad23 ("mm: prepare VM_DONTDUMP for using in drivers") in 2012.

此映射可能来自greattlbfs(在fs/hugetlbfs/inode.c中为hugetlbfs_file_mmap),因为ht位也已设置.

This mapping probably comes from hugetlbfs (hugetlbfs_file_mmap in fs/hugetlbfs/inode.c) because the ht bit is set as well.

这篇关于与成功的madvise(DONTDUMP)相同的ptr/size上的madvise(DODUMP)因EINVAL失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆