使用move_pages()移动大页面? [英] Using move_pages() to move hugepages?

查看:216
本文介绍了使用move_pages()移动大页面?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

该问题的答案是:

  1. 内核3.10.0-1062.4.3.el7.x86_64
  2. 通过启动参数分配的
  3. 非透明大页面,并且可能会映射到文件(例如,已挂载的大页面)
  4. x86_64
  1. kernel 3.10.0-1062.4.3.el7.x86_64
  2. non transparent hugepages allocated via boot parameters and might or might not be mapped to a file (e.g. mounted hugepages)
  3. x86_64

根据该内核move_pages()会调用do_pages_move()来移动页面,但我看不到它如何间接调用

According to this kernel source, move_pages() will call do_pages_move() to move a page, but I don't see how it indirectly calls migrate_huge_page().

所以我的问题是:

  1. move_pages()可以移动大页面吗?如果是,传递页面地址数组时页面边界应该是4KB还是2MB?似乎有一个补丁支持5年前移动大页面.
  2. 如果move_pages()无法移动大页面,我该如何移动大页面?
  3. 移动大页面后,是否可以像查询普通页面一样查询大页面的NUMA ID,例如 answer ?
  1. can move_pages() move hugepages? if yes, should the page boundary be 4KB or 2MB when passing an array of addresses of pages? It seems like there was a patch for supporting moving hugepages 5 years ago.
  2. if move_pages() cannot move hugepages, how can I move hugepages?
  3. after moving hugepages, can I query the NUMA IDs of hugepages the same way I query regular pages like this answer?

根据下面的代码,似乎我通过move_pages()移动了大页面,页面大小= 2MB,但这是正确的方法吗?:

According to the code below, it seems like I move hugepages via move_pages() with page size = 2MB but is it the correct way?:

#include <cstdint>
#include <iostream>
#include <numaif.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <errno.h>
#include <unistd.h>
#include <string.h>
#include <limits>

int main(int argc, char** argv) {
        const int32_t dst_node = strtoul(argv[1], nullptr, 10);
        const constexpr uint64_t size = 4lu * 1024 * 1024;
        const constexpr uint64_t pageSize = 2lu * 1024 * 1024;
        const constexpr uint32_t nPages = size / pageSize;
        int32_t status[nPages];
        std::fill_n(status, nPages, std::numeric_limits<int32_t>::min());;
        void* pages[nPages];
        int32_t dst_nodes[nPages];
        void* ptr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE | MAP_HUGETLB, -1, 0);

        if (ptr == MAP_FAILED) {
                throw "failed to map hugepages";
        }
        memset(ptr, 0x41, nPages*pageSize);
        for (uint32_t i = 0; i < nPages; i++) {
                pages[i] = &((char*)ptr)[i*pageSize];
                dst_nodes[i] = dst_node;
        }

        std::cout << "Before moving" << std::endl;

        if (0 != move_pages(0, nPages, pages, nullptr, status, 0)) {
            std::cout << "failed to inquiry pages because " << strerror(errno) << std::endl;
        }
        else {
                for (uint32_t i = 0; i < nPages; i++) {
                        std::cout << "page # " << i << " locates at numa node " << status[i] << std::endl;
                }
        }

        // real move
        if (0 != move_pages(0, nPages, pages, dst_nodes, status, MPOL_MF_MOVE_ALL)) {
                std::cout << "failed to move pages because " << strerror(errno) << std::endl;
                exit(-1);
        }

        const constexpr uint64_t smallPageSize = 4lu * 1024;
        const constexpr uint32_t nSmallPages = size / smallPageSize;
        void* smallPages[nSmallPages];
        int32_t smallStatus[nSmallPages] = {std::numeric_limits<int32_t>::min()};
        for (uint32_t i = 0; i < nSmallPages; i++) {
                smallPages[i] = &((char*)ptr)[i*smallPageSize];
        }


        std::cout << "after moving" << std::endl;
        if (0 != move_pages(0, nSmallPages, smallPages, nullptr, smallStatus, 0)) {
            std::cout << "failed to inquiry pages because " << strerror(errno) << std::endl;
        }
        else {
                for (uint32_t i = 0; i < nSmallPages; i++) {
                        std::cout << "page # " << i << " locates at numa node " << smallStatus[i] << std::endl;
                }
        }

}

我是否应该基于4KB页面大小(如上面的代码)查询NUMA ID?还是2MB?

And should I query the NUMA IDs based on 4KB page size (like the code above)? Or 2MB?

推荐答案

适用于3.10 linux内核的原始版本(未修补红帽,因为我没有用于rhel内核的LXR)

For original version of 3.10 linux kernel (not redhat patched, as I have no LXR for rhel kernels) syscall move_pages will force splitting huge page (2MB; both THP and hugetlbfs styles) into small pages (4KB). move_pages uses too short chunks (around 0.5MB if I calculated correctly) and the function graph is like:

move_pages ..-> migrate_pages-> unmap_and_move->

move_pages .. -> migrate_pages -> unmap_and_move ->

static int unmap_and_move(new_page_t get_new_page, unsigned long private,
            struct page *page, int force, enum migrate_mode mode)
{
    struct page *newpage = get_new_page(page, private, &result);
    ....
    if (unlikely(PageTransHuge(page)))
        if (unlikely(split_huge_page(page)))
            goto out;

PageTransHuge对于两种大页面(thp和libhugetlbs)都返回true: https://elixir.bootlin. com/linux/v3.10/source/include/linux/page-flags.h#L411

PageTransHuge returns true for both kinds of hugepages (thp and libhugetlbs): https://elixir.bootlin.com/linux/v3.10/source/include/linux/page-flags.h#L411

PageTransHuge()对于透明的巨大页面和hugetlbfs页面都返回true,但对于普通页面则不返回.

PageTransHuge() returns true for both transparent huge and hugetlbfs pages, but not normal pages.

split_huge_page 调用split_huge_page_to_list 其中:

将大页面拆分为普通页面.这不会更改首页的位置.

Split a hugepage into normal pages. This doesn't change the position of head page.

Split还将发出类型为THP_SPLIT的vm_event计数器增量.计数器在/proc/vmstat中导出(文件显示各种虚拟内存统计信息").您可以在使用此UUOC命令之前和之后 cat /proc/vmstat |grep thp_split检查此计数器测试.

Split will also emit vm_event counter increment of kind THP_SPLIT. The counters are exported in /proc/vmstat ("file displays various virtual memory statistics"). You can check this counter with this UUOC command cat /proc/vmstat |grep thp_split before and after your test.

在3.10版本中有一些用于unmap_and_move_huge_page函数的大页面迁移代码,它不是从move_pages调用的. 仅在中使用在

There were some code for hugepage migration in 3.10 version as unmap_and_move_huge_page function which is not called from move_pages. The only usage of it in 3.10 was in migrate_huge_page which is called only from memory failure handler soft_offline_huge_page (__soft_offline_page) (added 2010):

通过迁移或无效使页面脱机, 没有杀死任何东西.在这种情况下 页面尚未损坏(因此仍然可以访问), 但已纠正了许多错误,因此最好采取 出去.

Soft offline a page, by migration or invalidation, without killing anything. This is for the case when a page is not corrupted yet (so it's still valid to access), but has had a number of corrected errors and is better taken out.

答案:

move_pages()是否可以移动大页面?如果是,传递页面地址数组时页面边界应该是4KB还是2MB?似乎5年前有一个补丁支持移动大页面.

can move_pages() move hugepages? if yes, should the page boundary be 4KB or 2MB when passing an array of addresses of pages? It seems like there was a patch for supporting moving hugepages 5 years ago.

标准3.10内核具有move_pages,它将接受4KB页面指针的数组页面",它将大页面拆分(拆分)成512个小页面,然后迁移小页面.由于move_pages确实分开了对物理内存页的请求,因此它们被tp合并的可能性很小,

Standard 3.10 kernel have move_pages which will accept array "pages" of 4KB page pointers and it will break (split) huge page into 512 small pages and then it will migrate small pages. There are very low chances for them to be merged back by thp as move_pages does separate requests for physical memory pages and they almost always will be non-continuous.

不提供指向"2MB"的指针,它仍然会拆分提到的每个大页面,并且仅迁移该内存的前4KB小页面.

Don't give pointers to "2MB", it will still split every huge page mentioned and migrate only first 4KB small page of this memory.

2013补丁未添加到原始3.10内核中.

2013 patch was not added into original 3.10 kernel.

  • v2 https://lwn.net/Articles/544044/ "extend hugepage migration" (3.9);
  • v3 https://lwn.net/Articles/559575/ (3.11)
  • v4 https://lore.kernel.org/patchwork/cover/395020/ (click on Related to get access to individual patches, for example move_pages patch)

该补丁似乎在2013年9月被接受:

The patch seems to be accepted in September 2013: https://github.com/torvalds/linux/search?q=+extend+hugepage+migration&type=Commits

如果move_pages()无法移动大页面,我该如何移动大页面?

if move_pages() cannot move hugepages, how can I move hugepages?

move_pages会将大页面中的数据作为小页面移动.您可以:在正确的numa节点上以手动模式分配大页面并复制数据(如果要保留虚拟地址,则复制两次);或使用补丁将内核更新到某个版本,并使用补丁作者 Naoya Horiguchi(JP).有他的测试副本: https://github.com/srikanth007m/test_hugepage_migration_extension (需要 https://github.com/Naoya-Horiguchi/test_core )

move_pages will move data from hugepages as small pages. You can: allocate huge page in manual mode at correct numa node and copy your data (copy twice if you want to keep virtual address); or update kernel to some version with the patch and use methods and tests of patch author, Naoya Horiguchi (JP). There is copy of his tests: https://github.com/srikanth007m/test_hugepage_migration_extension (https://github.com/Naoya-Horiguchi/test_core is required)

https://github.com/srikanth007m/test_hugepage_migration_extension/blob/master/test_move_pages.c

现在,我不确定如何开始测试以及如何检查其是否正常运行.对于使用最新内核运行的./test_move_pages -v -m private -h 2048,它不会增加THP_SPLIT计数器.

Now I'm not sure how to start the test and how to check that it works correctly. For ./test_move_pages -v -m private -h 2048 runs with recent kernel it does not increment THP_SPLIT counter.

他的测试与我们的测试非常相似:mmap,memset到故障页面,用指向小页面的指针填充页面数组,numa_move_pages

His test looks very similar to our tests: mmap, memset to fault pages, filling pages array with pointers to small pages, numa_move_pages

移动大页面后,是否可以像查询此答案一样查询常规页面的方式来查询大页面的NUMA ID?

after moving hugepages, can I query the NUMA IDs of hugepages the same way I query regular pages like this answer?

您可以通过在查询模式(具有空节点)中为move_pages syscall提供正确的数组页面"来查询任何内存的状态.数组应列出要检查的内存区域的每个小页面.

You can query status of any memory by providing correct array "pages" to move_pages syscall in query mode (with null nodes). Array should list every small page of the memory region you want to check.

如果您知道任何可靠的方法来检查内存是否映射到大页面,则可以查询大页面的任何小页面.我认为,如果您可以将物理地址从内核导出到用户空间(使用某些 LKM模块):对于大页面,虚拟和物理地址将始终具有21个常见的 LSB位,对于小页面,位仅在百万分之一的测试中会重合.或者只是写LKM来导出 PMD目录.

If you know any reliable method to check if the memory mapped to huge page or not, you can query any small page of huge page. I think that there can be probabilistic method if you can export physical address out of kernel to the user-space (using some LKM module for example): for huge page virtual and physical addresses will always have 21 common LSB bits, and for small pages bits will coincide only for 1 test in million. Or just write LKM to export PMD Directory.

这篇关于使用move_pages()移动大页面?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆