顺序访问内核驱动程序中的大页面 [英] Sequential access to hugepages in kernel driver
问题描述
我正在使用使用由大页面支持的缓冲区的驱动程序,但是我发现大页面的顺序性会出现一些问题.
I'm working in a driver that uses a buffer backed by hugepages, and I'm finding some problems with the sequentality of the hugepages.
在用户空间中,程序使用mmap
syscall分配由大页面支持的大缓冲区.然后通过ioctl
调用将该缓冲区传递给驱动程序.驱动程序使用get_user_pages
函数获取该缓冲区的内存地址.
In userspace, the program allocates a big buffer backed by hugepages using the mmap
syscall. The buffer is then communicated to the driver through a ioctl
call. The driver uses the get_user_pages
function to get the memory address of that buffer.
这非常适合1 GB(1个大页面)的缓冲区大小. get_user_pages
返回很多页面(HUGE_PAGE_SIZE / PAGE_SIZE
),但是它们都是连续的,因此没有问题.我只是用page_address
来抓取第一页的地址,然后使用它.当另一个程序在char设备上进行mmap
调用时,驱动程序还可以使用remap_pfn_range
将缓冲区映射回用户空间.
This works perfectly with a buffer size of 1 GB (1 hugepage). get_user_pages
returns a lot of pages (HUGE_PAGE_SIZE / PAGE_SIZE
) but they're all contigous, so there's no problem. I just grab the address of the first page with page_address
and work with that. The driver can also map that buffer back to userspace with remap_pfn_range
when another program does a mmap
call on the char device.
但是,当缓冲区由多个大页面作为后备时,事情就变得复杂了.看来内核可以返回由非顺序大页支持的缓冲区.也就是说,如果巨大的页面池的布局是这样的
However, things get complicated when the buffer is backed by more than one hugepage. It seems that the kernel can return a buffer backed by non-sequential hugepages. I.e, if the hugepage pool's layout is something like this
+------+------+------+------+
| HP 1 | HP 2 | HP 3 | HP 4 |
+------+------+------+------+
,可以通过保留HP1和HP4,或者可能是HP3然后是HP2来满足对支持大页的缓冲区的请求.这意味着在最后一种情况下,当我使用get_user_pages
获取页面时,页面0的地址实际上是在页面262.144(下一个大页面的头部)的地址之后的1 GB.
, a request for a hugepage-backed buffer could be fulfilled by reserving HP1 and HP4, or maybe HP3 and then HP2. That means that when I get the pages with get_user_pages
in the last case, the address of page 0 is actually 1 GB after the address of page 262.144 (the next hugepage's head).
是否有办法顺序访问那些页面?我尝试对地址进行重新排序以找到较低的地址,以便可以使用整个缓冲区(例如,如果使用内核给了我一个由HP3支持的缓冲区,我使用HP2作为HP2的基址),但似乎会扰乱用户空间中的数据(在 reordered 缓冲区中的偏移量0可能在用户空间中偏移了1GB).用户空间缓冲区).
Is there any way to sequentalize access to those pages? I tried reordering the addresses to find the lower one so I can use the whole buffer (e.g., if kernel gives me a buffer backed by HP3, HP2 I use as base address the one of HP2), but it seems that would scramble the data in userspace (offset 0 in that reordered buffer is maybe offset 1GB in the userspace buffer).
TL; DR:给定> 1个无序的大页面,有什么方法可以在Linux内核驱动程序中顺序访问它们?
TL;DR: Given >1 unordered hugepages, is there any way to access them sequentially in a Linux kernel driver?
顺便说一句,我正在使用3.8.0-29通用内核的Linux机器.
By the way, I'm working on a Linux machine with 3.8.0-29-generic kernel.
推荐答案
Using the function suggested by CL, vm_map_ram
, I was able to remap the memory so it can be accesed sequentially, independently of the number of hugepages mapped. I leave the code here (error control not included) in case it helps anyone.
struct page** pages;
int retval;
unsigned long npages;
unsigned long buffer_start = (unsigned long) huge->addr; // Address from user-space map.
void* remapped;
npages = 1 + ((bufsize- 1) / PAGE_SIZE);
pages = vmalloc(npages * sizeof(struct page *));
down_read(¤t->mm->mmap_sem);
retval = get_user_pages(current, current->mm, buffer_start, npages,
1 /* Write enable */, 0 /* Force */, pages, NULL);
up_read(¤t->mm->mmap_sem);
nid = page_to_nid(pages[0]); // Remap on the same NUMA node.
remapped = vm_map_ram(pages, npages, nid, PAGE_KERNEL);
// Do work on remapped.
这篇关于顺序访问内核驱动程序中的大页面的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!