Raspberry Pi 4 上的 HugePages [英] HugePages on Raspberry Pi 4

查看:15
本文介绍了Raspberry Pi 4 上的 HugePages的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要有关在运行 raspberry pi OS 64 位的 raspberry pi 4 上管理 Hugepages 的帮助.
我在网上没有找到太多可靠的信息.
首先,我重新编译内核源代码,启用 Memory Management options ---> Transparent Hugepage Support 选项.当我运行命令时:

I need help about managing Hugepages on raspberry pi 4 running raspberry pi OS 64 bit.
I did not find much reliable information online.
First I recompiled the kernel source enabling Memory Management options --->Transparent Hugepage Support option. When I run the command:

grep -i巨大的/proc/meminfo

输出为:

AnonHugePages:    319488 kB
ShmemHugePages:        0 kB
FileHugePages:         0 k

并运行命令:

cat /sys/kernel/mm/transparent_hugepage/enabled

输出为:

[always] madvise never

所以我认为应该设置透明大页面(AnonHugePages).我需要使用 HugePages 来映射最大的连续内存块,使用 mmap 函数,c 代码.

So I think Transparent Huge Pages (AnonHugePages) should be set. I need to use HugePages to map the largest contiguous memory chunk using mmap function, c code.

mem = mmap(NULL,buf_size,PROT_READ|PROT_WRITE,MAP_SHARED,fd,0);

https://www.man7.org/linux/man-pages/man2/mmap.2.html 有两个标志来管理大页面:MAP_HUGETLB 标志和 MAP_HUGE_2MB、MAP_HUGE_1GB 标志.

Looking at https://www.man7.org/linux/man-pages/man2/mmap.2.html there are two flags to manage the hugepages: MAP_HUGETLB flag and MAP_HUGE_2MB, MAP_HUGE_1GB flag.

我的问题是:要使用 HugePages 我应该这样映射吗?

My question is: To use HugePages should I map in this way?

mem = mmap(NULL,buf_size,PROT_READ|PROT_WRITE,MAP_SHARED,MAP_HUGETLB,fd,0);

内核配置:

CONFIG_SYS_SUPPORTS_HUGETLBFS=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
CONFIG_HAVE_ARCH_HUGE_VMAP=y
CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
# CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set
CONFIG_TRANSPARENT_HUGE_PAGECACHE=y
# CONFIG_HUGETLBFS is not set

推荐答案

大页面是一种通过减少 TLB 未命中次数来增强应用程序性能的方法.该机制将连续的标准物理页面(典型大小为 4 KB)合并为一个大页面(例如 2 MB).Linux 以两种方式实现此功能:透明大页面和显式大页面.

Huge pages are a way to enhance the performances of the applications by reducing the number of TLB misses. The mechanism coalesces contiguous standard physical pages (typical size of 4 KB) into a big one (e.g. 2 MB). Linux implements this feature in two flavors: Transparent Huge pages and explicit huge pages.

透明大页面 (THP) 由内核透明管理.用户空间应用程序无法控制它们.只要有可能,内核就会尽最大努力分配大页面,但不能保证.此外,作为底层垃圾收集器",THP 可能会引入开销.名为 khugepaged 的内核守护进程负责合并物理页面以生成大页面.这可能会消耗 CPU 时间,并对正在运行的应用程序的性能产生不良影响.在具有时间关键应用程序的系统中,通常建议停用 THP.

Transparent huge pages (THP) are managed transparently by the kernel. The user space applications have no control on them. The kernel makes its best to allocate huge pages whenever it is possible but it is not guaranteed. Moreover, THP may introduce overhead as an underlying "garbage collector" kernel daemon named khugepaged is in charge of the coalescing of the physical pages to make huge pages. This may consume CPU time with undesirable effects on the performances of the running applications. In systems with time critical applications, it is generally advised to deactivate THP.

可以在引导命令行(参见本答案的结尾)或从 sysfs 中的 shell 禁用 THP:

THP can be disabled on the boot command line (cf. the end of this answer) or from the shell in sysfs:

$ cat /sys/kernel/mm/transparent_hugepage/enabled
always [madvise] never
$ sudo sh -c "echo never > /sys/kernel/mm/transparent_hugepage/enabled"
$ cat /sys/kernel/mm/transparent_hugepage/enabled
always madvise [never]

注意:关于 THP 的绩效评估/问题存在一些有趣的论文:

N.B.: Some interesting papers exist on the performance evaluation/issues of the THP:

如果在应用程序级别(即来自用户空间)需要大页面.HUGETLBFS 内核配置必须设置为激活 hugetlbfs 伪文件系统(内核配置器中的菜单类似于:文件系统"-->伪文件系统"文件系统"-->HugeTLB 文件系统支持").在内核源代码树中,此参数位于 fs/Kconfig 中:

If the huge pages are required at application level (i.e. from user space). HUGETLBFS kernel configuration must be set to activate the hugetlbfs pseudo-filesystem (the menu in the kernel configurator is something like: "File systems" --> "Pseudo filesystems" --> "HugeTLB file system support"). In the kernel source tree this parameter is in fs/Kconfig:

config HUGETLBFS
    bool "HugeTLB file system support"
    depends on X86 || IA64 || SPARC64 || (S390 && 64BIT) || 
           SYS_SUPPORTS_HUGETLBFS || BROKEN
    help
      hugetlbfs is a filesystem backing for HugeTLB pages, based on
      ramfs. For architectures that support it, say Y here and read
      <file:Documentation/admin-guide/mm/hugetlbpage.rst> for details.

      If unsure, say N.

例如,在 Ubuntu 系统上,我们可以检查:

For example, on an Ubuntu system, we can check:

$ cat /boot/config-5.4.0-53-generic | grep HUGETLBFS
CONFIG_HUGETLBFS=y

注意:在树莓派上,可以配置/proc/config.gz的apparition和zcat检查参数.要做到这一点,配置菜单是:一般设置".-->内核 .config 支持"+ 通过/proc/config.gz 启用对 .config 的访问"

N.B.: On Raspberry Pi, it is possible to configure the apparition of /proc/config.gz and do the same with zcat to check the parameter. To make it, the configuration menu is: "General setup" --> "Kernel .config support" + "Enable access to .config through /proc/config.gz"

设置此参数后,hugetlbfs 伪文件系统将添加到内核构建中(参见 fs/Makefile):

When this parameter is set, hugetlbfs pseudo-filesystem is added into the kernel build (cf. fs/Makefile):

obj-$(CONFIG_HUGETLBFS)     += hugetlbfs/

hugetlbfs 的源代码位于fs/hugetlbfs/inode.c.启动时,内核将挂载内部 hugetlbfs 文件系统,以支持其运行的架构的所有可用大页面大小:

The source code of hugetlbfs is located in fs/hugetlbfs/inode.c. At startup, the kernel will mount internal hugetlbfs file systems to support all the available huge page sizes for the architecture it is running on:

static int __init init_hugetlbfs_fs(void)
{
    struct vfsmount *mnt;
    struct hstate *h;
    int error;
    int i;

    if (!hugepages_supported()) {
        pr_info("disabling because there are no supported hugepage sizes
");
        return -ENOTSUPP;
    }

    error = -ENOMEM;
    hugetlbfs_inode_cachep = kmem_cache_create("hugetlbfs_inode_cache",
                    sizeof(struct hugetlbfs_inode_info),
                    0, SLAB_ACCOUNT, init_once);
    if (hugetlbfs_inode_cachep == NULL)
        goto out;

    error = register_filesystem(&hugetlbfs_fs_type);
    if (error)
        goto out_free;

    /* default hstate mount is required */
    mnt = mount_one_hugetlbfs(&hstates[default_hstate_idx]);
    if (IS_ERR(mnt)) {
        error = PTR_ERR(mnt);
        goto out_unreg;
    }
    hugetlbfs_vfsmount[default_hstate_idx] = mnt;

    /* other hstates are optional */
    i = 0;
    for_each_hstate(h) {
        if (i == default_hstate_idx) {
            i++;
            continue;
        }

        mnt = mount_one_hugetlbfs(h);
        if (IS_ERR(mnt))
            hugetlbfs_vfsmount[i] = NULL;
        else
            hugetlbfs_vfsmount[i] = mnt;
        i++;
    }

    return 0;

 out_unreg:
    (void)unregister_filesystem(&hugetlbfs_fs_type);
 out_free:
    kmem_cache_destroy(hugetlbfs_inode_cachep);
 out:
    return error;
}

hugetlbfs 文件系统是一种 RAM 文件系统,内核在其中创建文件以支持应用程序映射的内存区域.

A hugetlbfs file system is a sort of RAM file system into which the kernel creates files to back the memory regions mapped by the applications.

可以通过将需要的大页面数量写入/sys/kernel/mm/hugepages/hugepages-hugepagesize/nr_hugepages来预留需要的大页面数量.

The amount of needed huge pages can be reserved by writing the number of needed huge pages into /sys/kernel/mm/hugepages/hugepages-hugepagesize/nr_hugepages.

然后,mmap() 能够将应用程序地址空间的一部分映射到大页面上.这是一个显示如何执行此操作的示例:

Then, mmap() is able to map some part of the application address space onto huge pages. Here is an example showing how to do it:

#include <sys/mman.h>
#include <unistd.h>
#include <stdio.h>

#define HP_SIZE  (2 * 1024 * 1024) // <-- Adjust with size of the supported HP size on your system

int main(void)
{
  char *addr, *addr1;

  // Map a Huge page
  addr = mmap(NULL, HP_SIZE, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_SHARED| MAP_HUGETLB, -1, 0);
  if (addr == MAP_FAILED) {
    perror("mmap()");
    return 1;
  }

  printf("Mapping located at address: %p
", addr);

  pause();

  return 0;
}

在前面的程序中,addr指向的内存是基于大页面的.用法示例:

In the preceding program, the memory pointed by addr is based on huge pages. Example of usage:

$ gcc alloc_hp.c -o alloc_hp
$ ./alloc_hp
mmap(): Cannot allocate memory
$ cat /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
0
$ sudo sh -c "echo 1 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages"
$  cat /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
1
$ ./alloc_hp 
Mapping located at address: 0x7f7ef6c00000

在另一个终端中,可以观察进程映射来验证内存页的大小(它在pause()系统调用中被阻塞):

In another terminal, the process map can be observed to verify the size of the memory page (it is blocked in pause() system call):

$ pidof alloc_hp
13009
$ cat /proc/13009/smaps
[...]
7f7ef6c00000-7f7ef6e00000 rw-s 00000000 00:0f 331939     /anon_hugepage (deleted)
Size:               2048 kB
KernelPageSize:     2048 kB   <----- The page size is 2MB
MMUPageSize:        2048 kB
[...]

在前面的映射中,大页面区域的文件名/anon_hugepage是内核在内部生成的.它被标记为已删除,因为内核删除了相关的内存文件,这将使该文件在不再引用时立即消失(例如,当调用进程结束时,底层文件在 exit() 时关闭em>,文件上的引用计数器降为 0,删除操作完成使其消失).

In the preceding map, the file name /anon_hugepage for the huge page region is made internally by the kernel. It is marked deleted because the kernel removes the associated memory file which will make the file disappear as soon as there are no longer references on it (e.g. when the calling process ends, the underlying file is closed upon exit(), the reference counter on the file drops to 0 and the remove operation finishes to make it disappear).

在 Raspberry Pi 4B 上,默认大页面大小为 2MB,但该卡支持其他几种大页面大小:

On Raspberry Pi 4B, the default huge page size is 2MB but the card supports several other huge page sizes:

$ ls -l /sys/kernel/mm/hugepages
total 0
drwxr-xr-x 2 root root 0 Nov 23 14:58 hugepages-1048576kB
drwxr-xr-x 2 root root 0 Nov 23 14:58 hugepages-2048kB
drwxr-xr-x 2 root root 0 Nov 23 14:58 hugepages-32768kB
drwxr-xr-x 2 root root 0 Nov 23 14:58 hugepages-64kB

要使用它们,需要挂载与所需大页面大小相对应的 hugetlbfs 类型的文件系统.内核文档提供了有关可用的安装选项.例如,要在 /mnt/huge 上挂载一个 hugetlbfs 文件系统,它有 8 个大小为 64KB 的 Huge Pages,命令是:

To use them, it is necessary to mount a hugetlbfs type file system corresponding to the size of the desired huge page. The kernel documentation provides details on the available mount options. For example, to mount a hugetlbfs file system on /mnt/huge with 8 Huge Pages of size 64KB, the command is:

mount -t hugetlbfs -o pagesize=64K,size=512K,min_size=512K none /mnt/huge

然后就可以在用户程序中映射 64KB 的大页面.以下程序创建 /tmp/hpfs 目录,在该目录上挂载一个 hugetlbfs 文件系统,其大小为 4 个 64KB 大页面.一个名为/memfile_01 的文件被创建并扩展到 2 个大页面的大小.由于 mmap() 系统调用,文件被映射到内存中.它没有传递 MAP_HUGETLB 标志,因为所提供的文件描述符用于在 hugetlbfs 文件系统上创建的文件.然后,程序调用 pause() 暂停其执行,以便在另一个终端中进行一些观察:

Then it is possible to map huge pages of 64KB in a user program. The following program creates the /tmp/hpfs directory on which it mounts a hugetlbfs file system with a size of 4 huge pages of 64KB. A file named /memfile_01 is created and extended to the size of 2 huge pages. The file is mapped into memory thanks to mmap() system call. It is not passed MAP_HUGETLB flag as the provided file descriptor is for a file created on a hugetlbfs filesystem. Then, the program calls pause() to suspend its execution in order to make some observations in another terminal:

#include <sys/types.h>
#include <errno.h>
#include <stdio.h>
#include <sys/mman.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/mount.h>
#include <sys/stat.h>
#include <fcntl.h>


#define ERR(fmt, ...) do {                            
    fprintf(stderr,                                   
            "ERROR@%s#%d: "fmt,                       
             __FUNCTION__, __LINE__, ## __VA_ARGS__); 
                         } while(0)


#define HP_SIZE   (64 * 1024)
#define HPFS_DIR  "/tmp/hpfs"
#define HPFS_SIZE (4 * HP_SIZE)


int main(void)
{
void *addr;
char  cmd[256];
int   status;
int   rc;
char  mount_opts[256];
int   fd;

  rc = mkdir(HPFS_DIR, 0777);
  if (0 != rc && EEXIST != errno) {
    ERR("mkdir(): %m (%d)
", errno);
    return 1;
  }

  snprintf(mount_opts, sizeof(mount_opts), "pagesize=%d,size=%d,min_size=%d", HP_SIZE, 2*HP_SIZE, HP_SIZE);

  rc = mount("none", HPFS_DIR, "hugetlbfs", 0, mount_opts);
  if (0 != rc) {
    ERR("mount(): %m (%d)
", errno);
    return 1;
  }

  fd = open(HPFS_DIR"/memfile_01", O_RDWR|O_CREAT, 0777);
  if (fd < 0) {
    ERR("open(%s): %m (%d)
", "memfile_01", errno);
    return 1;
  }

  rc = ftruncate(fd, 2 * HP_SIZE);
  if (0 != rc) {
    ERR("ftruncate(): %m (%d)
", errno);
    return 1;
  }

  addr = mmap(NULL, 2 * HP_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
  if (MAP_FAILED == addr) {
    ERR("mmap(): %m (%d)
", errno);
    return 1;
  }

  // The file can be closed
  rc = close(fd);
  if (0 != rc) {
    ERR("close(%d): %m (%d)
", fd, errno);
    return 1;
  }

  pause();

  return 0;

} // main

前面的程序在调用 mount() 时必须以 root 身份运行:

The preceding program must be run as root as it calls mount():

$ gcc mount_tlbfs.c -o mount_tlbfs
$ cat /sys/kernel/mm/hugepages/hugepages-64kB/nr_hugepages 
0
$ sudo sh -c "echo 8 > /sys/kernel/mm/hugepages/hugepages-64kB/nr_hugepages"
$ cat /sys/kernel/mm/hugepages/hugepages-64kB/nr_hugepages 
8
$ sudo ./mount_tlbfs 

在另一个终端中,可以显示/proc/[pid]/smaps文件来检查大页面分配.程序一写入大页面,延迟分配机制就会触发大页面的有效分配.

In another terminal, the /proc/[pid]/smaps file can be displayed to check the huge page allocation. As soon as the program writes into the huge pages, the Lazy allocation mechanism triggers the effective allocation of the huge pages.

参见此文章以了解未来详情

大页面由连续的物理内存页面组成.预留应该在系统启动的早期完成(特别是在重载系统上),因为物理内存可能非常碎片化,有时之后无法分配大页面.为了尽早保留,这可以在内核上完成 启动命令行:

The huge pages are made with consecutive physical memory pages. The reservation should be done early in the system startup (especially on heavy loaded systems) as the physical memory may be so fragmented that it is sometimes impossible to allocate huge pages afterward. To reserve as early as possible, this can be done on the kernel boot command line:

hugepages=  
       [HW] Number of HugeTLB pages to allocate at boot.
       If this follows hugepagesz (below), it specifies
       the number of pages of hugepagesz to be allocated.
       If this is the first HugeTLB parameter on the command
       line, it specifies the number of pages to allocate for
       the default huge page size.  See also
       Documentation/admin-guide/mm/hugetlbpage.rst.
       Format: <integer>

hugepagesz=
        [HW] The size of the HugeTLB pages.  This is used in
        conjunction with hugepages (above) to allocate huge
        pages of a specific size at boot.  The pair
        hugepagesz=X hugepages=Y can be specified once for
        each supported huge page size. Huge page sizes are
        architecture dependent.  See also
        Documentation/admin-guide/mm/hugetlbpage.rst.
        Format: size[KMG]

transparent_hugepage=
        [KNL]
        Format: [always|madvise|never]
        Can be used to control the default behavior of the system
        with respect to transparent hugepages.
        See Documentation/admin-guide/mm/transhuge.rst
        for more details.

在 Raspberry Pi 上,引导命令行通常可以在 /boot/cmdline.txt 中更新,运行内核使用的当前引导命令行可以在 /proc/中看到命令行.

On Raspberry Pi, the boot command line can typically be updated in /boot/cmdline.txt and the current boot command line used by the running kernel can be seen in /proc/cmdline.

注意:

  • 此处这里
  • 有一个名为 libhugetlbfs 的用户空间库,它提供了一个抽象层此处描述的内核 hugetlbfs 机制的顶部.它带有诸如 get_huge_pages() 之类的图书馆服务以及诸如 hugectl.这个用户空间服务的目标是将STATICALLY链接的可执行文件的堆和文本+数据段映射到大页面(不支持动态链接程序的映射).所有这些都依赖于本答案中描述的内核功能.
  • This recipe is explained in more details here and here
  • There is a user space library called libhugetlbfs which offers a layer of abstraction on top of the kernel's hugetlbfs mechanism described here. It comes with library services like get_huge_pages() and accompanying tools like hugectl. The goal of this user space service is to map the heap and text+data segments of STATICALLY linked executables into huge pages (the mapping of dynamically linked programs is not supported). All of this relies on the kernel features described in this answer.

这篇关于Raspberry Pi 4 上的 HugePages的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆