需要帮助在 Xilinx/ARM SoC (Zynq 7000) 上映射预先保留的 **可缓存** DMA 缓冲区 [英] Need help mapping pre-reserved **cacheable** DMA buffer on Xilinx/ARM SoC (Zynq 7000)

查看:42
本文介绍了需要帮助在 Xilinx/ARM SoC (Zynq 7000) 上映射预先保留的 **可缓存** DMA 缓冲区的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个基于 Xilinx Zynq 7000 的电路板,在具有 DMA 功能(在 AXI 总线上)的 FPGA 结构中有一个外设.我们开发了一个电路,并在 ARM 内核上运行 Linux.我们在从用户空间访问 DMA 缓冲区时遇到性能问题,该缓冲区已被硬件填充.

I've got a Xilinx Zynq 7000-based board with a peripheral in the FPGA fabric that has DMA capability (on an AXI bus). We've developed a circuit and are running Linux on the ARM cores. We're having performance problems accessing a DMA buffer from user space after it's been filled by hardware.

总结:

我们在启动时预先保留了一部分 DRAM 用作大型 DMA 缓冲区.我们显然使用了错误的 API 来映射这个缓冲区,因为它似乎没有缓存,而且访问速度很糟糕.

We have pre-reserved at boot time a section of DRAM for use as a large DMA buffer. We're apparently using the wrong APIs to map this buffer, because it appears to be uncached, and the access speed is terrible.

由于糟糕的性能,即使将其用作反弹缓冲区也非常缓慢.IIUC,ARM 缓存与 DMA 不相关,因此我非常感谢有关如何执行以下操作的一些见解:

Using it even as a bounce-buffer is untenably slow due to horrible performance. IIUC, ARM caches are not DMA coherent, so I would really appreciate some insight on how to do the following:

  1. 将 DRAM 的一个区域映射到内核虚拟地址空间,但要确保它可缓存.
  2. 确保将其映射到用户空间也不会产生不良影响,即使这需要我们由自己的驱动程序提供 mmap 调用.
  3. 在执行 DMA 之前明确地使缓存层次结构中的物理内存区域无效,以确保一致性.

更多信息:

在提问之前,我一直在尝试彻底研究这个问题.不幸的是,这是一个 ARM SoC/FPGA,关于这方面的信息很少,所以我不得不直接问专家.

I've been trying to research this thoroughly before asking. Unfortunately, this being an ARM SoC/FPGA, there's very little information available on this, so I have to ask the experts directly.

由于这是一个 SoC,很多东西都是为 u-boot 硬编码的.例如,在将控制权交给内核之前,内核和 ramdisk 被加载到 DRAM 中的特定位置.我们利用这一点为 DMA 缓冲区保留了 64MB 的 DRAM 部分(它确实需要那么大,这就是我们预先保留它的原因).无需担心内存类型冲突或内核在此内存上踩踏,因为启动参数会告诉内核它可以控制 DRAM 的哪个区域.

Since this is an SoC, a lot of stuff is hard-coded for u-boot. For instance, the kernel and a ramdisk are loaded to specific places in DRAM before handing control over to the kernel. We've taken advantage of this to reserve a 64MB section of DRAM for a DMA buffer (it does need to be that big, which is why we pre-reserve it). There isn't any worry about conflicting memory types or the kernel stomping on this memory, because the boot parameters tell the kernel what region of DRAM it has control over.

最初,我们尝试使用ioremap将此物理地址范围映射到内核空间,但这似乎标记了该区域不可缓存,并且访问速度很糟糕,即使我们尝试使用memcpy使其成为反弹缓冲区.我们使用/dev/mem 将其映射到用户空间,我将 memcpy 计时为大约 70MB/秒.

Initially, we tried to map this physical address range into kernel space using ioremap, but that appears to mark the region uncacheable, and the access speed is horrible, even if we try to use memcpy to make it a bounce buffer. We use /dev/mem to map this also into userspace, and I've timed memcpy as being around 70MB/sec.

基于对该主题的大量搜索,似乎虽然有一半的人想要像这样使用 ioremap(这可能是我们的想法的来源),但不应该将 ioremap 用于此目的,并且应该使用与 DMA 相关的 API.不幸的是,似乎 DMA 缓冲区分配是完全动态的,我还没有想出如何告诉它,这里已经分配了一个物理地址——使用它."

Based on a fair amount of searching on this topic, it appears that although half the people out there want to use ioremap like this (which is probably where we got the idea from), ioremap is not supposed to be used for this purpose and that there are DMA-related APIs that should be used instead. Unfortunately, it appears that DMA buffer allocation is totally dynamic, and I haven't figured out how to tell it, "here's a physical address already allocated -- use that."

我看过的一个文档是这个,但它太以 x86 和 PC 为中心了:https://www.kernel.org/doc/Documentation/DMA-API-HOWTO.txt

One document I looked at is this one, but it's way too x86 and PC-centric: https://www.kernel.org/doc/Documentation/DMA-API-HOWTO.txt

这个问题也出现在我搜索的顶部,但没有真正的答案:获取Linux下缓冲区的物理地址

And this question also comes up at the top of my searches, but there's no real answer: get the physical address of a buffer under Linux

查看标准调用,dma_set_mask_and_coherent 和 family 不会采用预定义的地址,而是需要用于 PCI 的设备结构.我没有这样的结构,因为这是一个没有PCI的ARM SoC.我可以手动填充这样的结构,但这对我来说就像滥用 API,而不是按预期使用它.

Looking at the standard calls, dma_set_mask_and_coherent and family won't take a pre-defined address and wants a device structure for PCI. I don't have such a structure, because this is an ARM SoC without PCI. I could manually populate such a structure, but that smells to me like abusing the API, not using it as intended.

顺便说一句:这是一个环形缓冲区,我们将数据块 DMA 到不同的偏移量中,但我们对齐缓存行边界,因此不存在错误共享的风险.

BTW: This is a ring buffer, where we DMA data blocks into different offsets, but we align to cache line boundaries, so there is no risk of false sharing.

感谢您提供的任何帮助!

Thank you a million for any help you can provide!

更新:如果您以正常方式进行操作,则 ARM 上似乎没有可缓存的 DMA 缓冲区之类的东西.也许如果我不进行ioremap调用,该区域不会被标记为不可缓存,但是我必须弄清楚如何在ARM上进行缓存管理,我无法弄清楚.问题之一是用户空间中的 memcpy 似乎真的很糟糕.是否有针对我可以使用的未缓存内存进行了优化的 memcpy 实现?也许我可以写一篇.我必须弄清楚这个处理器是否有 Neon.

UPDATE: It appears that there's no such thing as a cacheable DMA buffer on ARM if you do it the normal way. Maybe if I don't make the ioremap call, the region won't be marked as uncacheable, but then I have to figure out how to do cache management on ARM, which I can't figure out. One of the problems is that memcpy in userspace appears to really suck. Is there a memcpy implementation that's optimized for uncached memory I can use? Maybe I could write one. I have to figure out if this processor has Neon.

推荐答案

您是否尝试过使用 mmap() 方法实现您自己的字符设备,将您的缓冲区重新映射为可缓存(通过 remap_pfn_range())?

Have you tried implementing your own char device with an mmap() method remapping your buffer as cacheable (by means of remap_pfn_range())?

这篇关于需要帮助在 Xilinx/ARM SoC (Zynq 7000) 上映射预先保留的 **可缓存** DMA 缓冲区的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆