在用户空间中启用写合并IO访问 [英] Enabling write-combining IO access in userspace
问题描述
我有一个带用户空间驱动程序的PCIe设备.我正在通过BAR将命令写入设备,这些命令对延迟敏感,并且数据量很小(〜64字节),所以我不想使用DMA.
I have a PCIe device with a userspace driver. I'm writing commands to the device through a BAR, the commands are latency sensitive and amount of data is small (~64-bytes) so I don't want to use DMA.
如果我使用ioremap_wc
重映射内核中BAR的物理地址,然后将64字节写入内核内部的BAR ,则可以看到64字节被写为PCIe上的单个TLP.如果我允许用户空间程序使用MAP_SHARED
标志mmap
对该区域进行写入,然后写入64字节,则我会在PCIe总线上看到多个TPL,而不是单个事务.
If I remap the physical address of the BAR in the kernel using ioremap_wc
and then write 64-bytes to the BAR inside the kernel, I can see that the 64-bytes are written as a single TLP over PCIe. If I allow my userspace program to mmap
the region with the MAP_SHARED
flag and then write 64-bytes I see multiple TPLs on the PCIe bus, rather than a single transaction.
根据内核 PAT文档,我应该能够通过以下方式导出写组合页面到用户空间:
According to the kernel PAT documentation I should be able to export write-combined pages through to userspace:
希望将某些页面导出到用户空间的驱动程序通过使用mmap来完成 界面和
Drivers wanting to export some pages to userspace do it by using mmap interface and a combination of
1)pgprot_noncached()
2)io_remap_pfn_range()
或remap_pfn_range()
或vm_insert_pfn()
具有PAT支持,正在添加新的API pgprot_writecombine
.所以,
驱动程序可以继续使用上述顺序,
在步骤1中pgprot_noncached()
或pgprot_writecombine()
步骤2.
With PAT support, a new API pgprot_writecombine
is being added. So,
drivers can continue to use the above sequence, with either
pgprot_noncached()
or pgprot_writecombine()
in step 1, followed by
step 2.
基于此文档,mmap处理程序中的相关内核代码如下所示:
Based on this documentation, the relevant kernel code from my mmap handler looks like this:
vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
return io_remap_pfn_range(vma,
vma->vm_start,
info->mem[vma->vm_pgoff].addr >> PAGE_SHIFT,
vma->vm_end - vma->vm_start,
vma->vm_page_prot);
我的PCIe设备显示在lspci中,并且BAR被标记为可以预取:
My PCIe device shows up in lspci with the BARs marked as prefetchable as expected:
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 11
Region 0: Memory at d8000000 (64-bit, prefetchable) [size=32M]
Region 2: Memory at d4000000 (64-bit, prefetchable) [size=64M]
当我从用户空间调用mmap
时,我看到一条日志消息(已设置debugpat内核启动参数):
When I call mmap
from userspace I see a log message (having set debugpat kernel boot parameter):
添加了reserve_memtype [mem 0xd4000000-0xd7ffffff],跟踪写合并,req写合并,ret写合并
reserve_memtype added [mem 0xd4000000-0xd7ffffff], track write-combining, req write-combining, ret write-combining
我还可以在/sys/kernel/debug/x86/pat_memtype_list
中看到PAT条目看起来正确,并且没有重叠的区域:
I can also see in /sys/kernel/debug/x86/pat_memtype_list
that a PAT entry looks correct and there are no overlapping regions:
write-combining @ 0xd4000000-0xd8000000
uncached-minus @ 0xd8000000-0xda000000
我还检查了没有与PAT配置冲突的MTRR条目.据我所知,一切都已正确设置,可以在用户空间中进行写合并,但是使用PCIe分析器观察PCIe总线上的事务,那里的用户空间访问模式与内核执行的同一写操作完全不同在ioremap_wc
调用之后.
I have also checked that there are no MTRR entries that would conflict with the PAT configuration. As far as I can see, everything is set up correctly for write-combining to occur in userspace, however using a PCIe analyser to observe the transactions on the PCIe bus there the userspace access pattern is completely different to the same write performed from the kernel after an ioremap_wc
call.
为什么写合并在用户空间中无法正常工作?
Why is write-combining not working as expected from userspace?
我该怎么做进一步调试?
What can I do to debug further?
我目前正在单插槽6核i7-3930K上运行.
I'm currently running on a single socket 6-core i7-3930K.
推荐答案
我不知道这是否有帮助,但这就是我在PCIe上进行写合并的方式.当然,它在内核空间中,但这符合Intel文档.如果您遇到问题,值得尝试.
I don't know if this will help, but this is how I got write-combining working on PCIe. Granted, it was in kernel space, but this complies with the Intel documentation. It's worth trying if you're stuck.
全局定义:
unsigned int __attribute__ ((aligned(0x20))) srcArr[ARR_SIZE];
在您的职能中:
int *pDestAddr
for (i = 0; i < ARR_SIZE; i++) {
_mm_stream_si32(pDestAddr + i, pSrcAddr[i]);
}
这篇关于在用户空间中启用写合并IO访问的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!