在用户空间中启用写合并IO访问 [英] Enabling write-combining IO access in userspace

查看:346
本文介绍了在用户空间中启用写合并IO访问的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带用户空间驱动程序的PCIe设备.我正在通过BAR将命令写入设备,这些命令对延迟敏感,并且数据量很小(〜64字节),所以我不想使用DMA.

I have a PCIe device with a userspace driver. I'm writing commands to the device through a BAR, the commands are latency sensitive and amount of data is small (~64-bytes) so I don't want to use DMA.

如果我使用ioremap_wc重映射内核中BAR的物理地址,然后将64字节写入内核内部的BAR ,则可以看到64字节被写为PCIe上的单个TLP.如果我允许用​​户空间程序使用MAP_SHARED标志mmap对该区域进行写入,然后写入64字节,则我会在PCIe总线上看到多个TPL,而不是单个事务.

If I remap the physical address of the BAR in the kernel using ioremap_wc and then write 64-bytes to the BAR inside the kernel, I can see that the 64-bytes are written as a single TLP over PCIe. If I allow my userspace program to mmap the region with the MAP_SHARED flag and then write 64-bytes I see multiple TPLs on the PCIe bus, rather than a single transaction.

根据内核 PAT文档,我应该能够通过以下方式导出写组合页面到用户空间:

According to the kernel PAT documentation I should be able to export write-combined pages through to userspace:

希望将某些页面导出到用户空间的驱动程序通过使用mmap来完成 界面和

Drivers wanting to export some pages to userspace do it by using mmap interface and a combination of

1)pgprot_noncached()

2)io_remap_pfn_range()remap_pfn_range()vm_insert_pfn()

具有PAT支持,正在添加新的API pgprot_writecombine.所以, 驱动程序可以继续使用上述顺序, 在步骤1中pgprot_noncached()pgprot_writecombine() 步骤2.

With PAT support, a new API pgprot_writecombine is being added. So, drivers can continue to use the above sequence, with either pgprot_noncached() or pgprot_writecombine() in step 1, followed by step 2.

基于此文档,mmap处理程序中的相关内核代码如下所示:

Based on this documentation, the relevant kernel code from my mmap handler looks like this:

 vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);

 return io_remap_pfn_range(vma,
                           vma->vm_start,
                           info->mem[vma->vm_pgoff].addr >> PAGE_SHIFT,
                           vma->vm_end - vma->vm_start,
                           vma->vm_page_prot);

我的PCIe设备显示在lspci中,并且BAR被标记为可以预取:

My PCIe device shows up in lspci with the BARs marked as prefetchable as expected:

    Latency: 0, Cache Line Size: 64 bytes
    Interrupt: pin A routed to IRQ 11
    Region 0: Memory at d8000000 (64-bit, prefetchable) [size=32M]
    Region 2: Memory at d4000000 (64-bit, prefetchable) [size=64M]

当我从用户空间调用mmap时,我看到一条日志消息(已设置debugpat内核启动参数):

When I call mmap from userspace I see a log message (having set debugpat kernel boot parameter):

添加了reserve_memtype [mem 0xd4000000-0xd7ffffff],跟踪写合并,req写合并,ret写合并

reserve_memtype added [mem 0xd4000000-0xd7ffffff], track write-combining, req write-combining, ret write-combining

我还可以在/sys/kernel/debug/x86/pat_memtype_list中看到PAT条目看起来正确,并且没有重叠的区域:

I can also see in /sys/kernel/debug/x86/pat_memtype_list that a PAT entry looks correct and there are no overlapping regions:

write-combining @ 0xd4000000-0xd8000000
uncached-minus  @ 0xd8000000-0xda000000

我还检查了没有与PAT配置冲突的MTRR条目.据我所知,一切都已正确设置,可以在用户空间中进行写合并,但是使用PCIe分析器观察PCIe总线上的事务,那里的用户空间访问模式与内核执行的同一写操作完全不同在ioremap_wc调用之后.

I have also checked that there are no MTRR entries that would conflict with the PAT configuration. As far as I can see, everything is set up correctly for write-combining to occur in userspace, however using a PCIe analyser to observe the transactions on the PCIe bus there the userspace access pattern is completely different to the same write performed from the kernel after an ioremap_wc call.

为什么写合并在用户空间中无法正常工作?

Why is write-combining not working as expected from userspace?

我该怎么做进一步调试?

What can I do to debug further?

我目前正在单插槽6核i7-3930K上运行.

I'm currently running on a single socket 6-core i7-3930K.

推荐答案

我不知道这是否有帮助,但这就是我在PCIe上进行写合并的方式.当然,它在内核空间中,但这符合Intel文档.如果您遇到问题,值得尝试.

I don't know if this will help, but this is how I got write-combining working on PCIe. Granted, it was in kernel space, but this complies with the Intel documentation. It's worth trying if you're stuck.

全局定义:

unsigned int __attribute__ ((aligned(0x20))) srcArr[ARR_SIZE];

在您的职能中:

int *pDestAddr

for (i = 0; i < ARR_SIZE; i++) {
    _mm_stream_si32(pDestAddr + i, pSrcAddr[i]);
}

这篇关于在用户空间中启用写合并IO访问的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆