顺序访问内核驱动程序中的大页面 [英] Sequential access to hugepages in kernel driver

查看:128
本文介绍了顺序访问内核驱动程序中的大页面的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用使用由大页面支持的缓冲区的驱动程序,但是我发现大页面的顺序性会出现一些问题.

I'm working in a driver that uses a buffer backed by hugepages, and I'm finding some problems with the sequentality of the hugepages.

在用户空间中,程序使用mmap syscall分配由大页面支持的大缓冲区.然后通过ioctl调用将该缓冲区传递给驱动程序.驱动程序使用get_user_pages函数获取该缓冲区的内存地址.

In userspace, the program allocates a big buffer backed by hugepages using the mmap syscall. The buffer is then communicated to the driver through a ioctl call. The driver uses the get_user_pages function to get the memory address of that buffer.

这非常适合1 GB(1个大页面)的缓冲区大小. get_user_pages返回很多页面(HUGE_PAGE_SIZE / PAGE_SIZE),但是它们都是连续的,因此没有问题.我只是用page_address来抓取第一页的地址,然后使用它.当另一个程序在char设备上进行mmap调用时,驱动程序还可以使用remap_pfn_range将缓冲区映射回用户空间.

This works perfectly with a buffer size of 1 GB (1 hugepage). get_user_pages returns a lot of pages (HUGE_PAGE_SIZE / PAGE_SIZE) but they're all contigous, so there's no problem. I just grab the address of the first page with page_address and work with that. The driver can also map that buffer back to userspace with remap_pfn_range when another program does a mmap call on the char device.

但是,当缓冲区由多个大页面作为后备时,事情就变得复杂了.看来内核可以返回由非顺序大页支持的缓冲区.也就是说,如果巨大的页面池的布局是这样的

However, things get complicated when the buffer is backed by more than one hugepage. It seems that the kernel can return a buffer backed by non-sequential hugepages. I.e, if the hugepage pool's layout is something like this

+------+------+------+------+
| HP 1 | HP 2 | HP 3 | HP 4 |
+------+------+------+------+

,可以通过保留HP1和HP4,或者可能是HP3然后是HP2来满足对支持大页的缓冲区的请求.这意味着在最后一种情况下,当我使用get_user_pages获取页面时,页面0的地址实际上是在页面262.144(下一个大页面的头部)的地址之后的1 GB.

, a request for a hugepage-backed buffer could be fulfilled by reserving HP1 and HP4, or maybe HP3 and then HP2. That means that when I get the pages with get_user_pages in the last case, the address of page 0 is actually 1 GB after the address of page 262.144 (the next hugepage's head).

是否有办法顺序访问那些页面?我尝试对地址进行重新排序以找到较低的地址,以便可以使用整个缓冲区(例如,如果使用内核给了我一个由HP3支持的缓冲区,我使用HP2作为HP2的基址),但似乎会扰乱用户空间中的数据(在 reordered 缓冲区中的偏移量0可能在用户空间中偏移了1GB).用户空间缓冲区).

Is there any way to sequentalize access to those pages? I tried reordering the addresses to find the lower one so I can use the whole buffer (e.g., if kernel gives me a buffer backed by HP3, HP2 I use as base address the one of HP2), but it seems that would scramble the data in userspace (offset 0 in that reordered buffer is maybe offset 1GB in the userspace buffer).

TL; DR:给定> 1个无序的大页面,有什么方法可以在Linux内核驱动程序中顺序访问它们?

TL;DR: Given >1 unordered hugepages, is there any way to access them sequentially in a Linux kernel driver?

顺便说一句,我正在使用3.8.0-29通用内核的Linux机器.

By the way, I'm working on a Linux machine with 3.8.0-29-generic kernel.

推荐答案

使用CL建议的功能,

Using the function suggested by CL, vm_map_ram, I was able to remap the memory so it can be accesed sequentially, independently of the number of hugepages mapped. I leave the code here (error control not included) in case it helps anyone.

struct page** pages;
int retval;
unsigned long npages;
unsigned long buffer_start = (unsigned long) huge->addr; // Address from user-space map.
void* remapped;

npages =  1 + ((bufsize- 1) / PAGE_SIZE); 

pages = vmalloc(npages * sizeof(struct page *));

down_read(&current->mm->mmap_sem);
retval = get_user_pages(current, current->mm, buffer_start, npages,
                     1 /* Write enable */, 0 /* Force */, pages, NULL);
up_read(&current->mm->mmap_sem);    

nid = page_to_nid(pages[0]); // Remap on the same NUMA node.

remapped = vm_map_ram(pages, npages, nid, PAGE_KERNEL);

// Do work on remapped.

这篇关于顺序访问内核驱动程序中的大页面的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆