DMA缓存一致性管理 [英] DMA cache coherence management

查看:276
本文介绍了DMA缓存一致性管理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的问题是:当我在设备驱动程序中正确使用[pci_]dma_sync_single_for_{cpu,device}时,如何确定何时可以安全地禁用缓存侦听?

My question is this: how can I determine when it is safe to disable cache snooping when I am correctly using [pci_]dma_sync_single_for_{cpu,device} in my device driver?

我正在为通过PCI Express(DMA)直接写入RAM的设备开发设备驱动程序,并担心管理缓存一致性.在启动DMA时,我可以设置一个控制位来启用或禁用DMA期间的缓存侦听,显然为了提高性能,我想尽可能使缓存侦听保持禁用状态.

I'm working on a device driver for a device which writes directly to RAM over PCI Express (DMA), and am concerned about managing cache coherence. There is a control bit I can set when initiating DMA to enable or disable cache snooping during DMA, clearly for performance I would like to leave cache snooping disabled if at all possible.

在中断例程中,当切换DMA缓冲区时,我适当地调用了pci_dma_sync_single_for_cpu()..._for_device(),但是在32位Linux 2.6.18(RHEL 5)上,事实证明这些命令是扩展为空的宏. ..解释了为什么在此内核上禁用缓存侦听时我的设备为何返回垃圾!

In the interrupt routine I call pci_dma_sync_single_for_cpu() and ..._for_device() as appropriate, when switching DMA buffers, but on 32-bit Linux 2.6.18 (RHEL 5) it turns out that these commands are macros which expand to nothing ... which explains why my device returns garbage when cache snooping is disabled on this kernel!

我已经浏览了内核源代码的历史,似乎直到2.6.25,只有64位x86才具有用于DMA同步的挂钩.从2.6.26开始,似乎已经存在通过dma_map_opssync_single_for_{cpu,device}字段进行DMA同步的通用统一间接机制(当前在include/asm-generic/dma-mapping-common.h中),但是到目前为止,我还没有找到这些操作的任何定义.

I've trawled through the history of the kernel sources, and it seems that up until 2.6.25 only 64-bit x86 had hooks for DMA synchronisation. From 2.6.26 there seems to be a generic unified indirection mechanism for DMA synchronisation (currently in include/asm-generic/dma-mapping-common.h) via fields sync_single_for_{cpu,device} of dma_map_ops, but so far I've failed to find any definitions of these operations.

推荐答案

我真的很惊讶,没有人回答这个问题,因此,我们继续进行非Linux特定的回答(我对Linux内核本身的了解不足,更具体)...

I'm really surprised no one has answered this, so here we go on a non-Linux specific answer (I have insufficient knowledge of the Linux kernel itself to be more specific) ...

缓存侦听只是告诉DMA控制器向所有CPU发送缓存无效请求,以将其直接存入DMA的内存.显然,这会增加高速缓存一致性总线的负载,并且由于无法将所有CPU与发出探听的DMA控制器进行单跳连接,因此在其他处理器上的缩放比例特别糟糕.因此,对何时可以安全地禁用高速缓存侦听"的简单答案是,任何CPU高速缓存中都不存在被DMA访问的内存,或者其高速缓存行被标记为无效.换句话说,任何从DMAed区域读取的尝试都会总是导致从主存储器读取.

Cache snooping simply tells the DMA controller to send cache invalidation requests to all CPUs for the memory being DMAed into. This obviously adds load to the cache coherency bus, and it scales particularly badly with additional processors as not all CPUs will have a single hop connection with the DMA controller issuing the snoop. Therefore, the simple answer to "when it is safe to disable cache snooping" is when the memory being DMAed into either does not exist in any CPU cache OR its cache lines are marked as invalid. In other words, any attempt to read from the DMAed region will always result in a read from main memory.

那么,如何确保从DMA区域读取的数据始终会进入主存储器?

So how do you ensure reads from a DMAed region will always go to main memory?

在我们拥有DMA缓存监听等高级功能的前一天,我们曾经做过的工作是通过将DMA存储器馈入以下几个分解阶段来流水线化DMA:

Back in the day before we had fancy features like DMA cache snooping, what we used to do was to pipeline DMA memory by feeding it through a series of broken up stages as follows:

阶段1:将脏的" DMA内存区域添加到脏的且需要清除"的DMA内存列表中.

Stage 1: Add "dirty" DMA memory region to the "dirty and needs to be cleaned" DMA memory list.

阶段2:下次设备中断时,使用新的DMA数据中断时,对于可能访问这些块的所有CPU(通常是每个块),在脏且需要清除"列表中为DMA段发出异步本地CPU缓存无效CPU运行自己的由本地存储块组成的列表.将这些句段移到干净"列表中.

Stage 2: Next time the device interrupts with fresh DMA'ed data, issue an async local CPU cache invalidate for DMA segments in the "dirty and needs to be cleaned" list for all CPUs which might access those blocks (often each CPU runs its own lists made up of local memory blocks). Move said segments into a "clean" list.

第3阶段:下一个DMA中断(当然,您肯定不会在上一个缓存失效完成之前发生),从清除"列表中取出一个新区域,并告知设备其下一个DMA应该进入那.回收所有脏块.

Stage 3: Next DMA interrupt (which of course you're sure will not occur before the previous cache invalidate has completed), take a fresh region from the "clean" list and tell the device that its next DMA should go into that. Recycle any dirty blocks.

第4阶段:重复.

尽管这是更多工作,但它具有几个主要优点.首先,您可以将DMA处理固定到单个CPU(通常是主CPU0)或单个SMP节点,这意味着只有单个CPU/节点需要担心缓存失效.其次,通过随着时间间隔分配操作并分散缓存一致性总线上的负载,使内存子系统有更多机会为您隐藏内存延迟.性能的关键通常是尝试使任何DMA发生在尽可能靠近相关DMA控制器的CPU上,并尽可能靠近该CPU进入内存.

As much as this is more work, it has several major advantages. Firstly, you can pin DMA handling to a single CPU (typically the primary CPU0) or a single SMP node, which means only a single CPU/node need worry about cache invalidation. Secondly, you give the memory subsystem much more opportunity to hide memory latencies for you by spacing out operations over time and spreading out load on the cache coherency bus. The key for performance is generally to try and make any DMA occur on a CPU as close to the relevant DMA controller as possible and into memory as close to that CPU as possible.

如果您总是将新的DMA移交给内存到用户空间和/或其他CPU,只需将新获取的内存注入异步高速缓存失效管道的前端即可.一些操作系统(不确定Linux)具有优化的例程,可以对内存清零进行预排序,因此,操作系统基本上将后台内存清零并保持快速满意的高速缓存-因为将内存清零,您将需要将新的内存请求保持在该缓存量以下.非常慢.我不知道过去十年中生产的使用硬件卸载内存清零的平台,因此您必须假定所有新内存都可能包含需要失效的有效缓存行.

If you always hand off newly DMAed into memory to user space and/or other CPUs, simply inject freshly acquired memory in at the front of the async cache invalidating pipeline. Some OSs (not sure about Linux) have an optimised routine for preordering zeroed memory, so the OS basically zeros memory in the background and keeps a quick satisfy cache around - it will pay you to keep new memory requests below that cached amount because zeroing memory is extremely slow. I'm not aware of any platform produced in the past ten years which uses hardware offloaded memory zeroing, so you must assume that all fresh memory may contain valid cache lines which need invalidating.

我很高兴这只能回答您一半的问题,但是总比没有好.祝你好运!

I appreciate this only answers half your question, but it's better than nothing. Good luck!

Niall

这篇关于DMA缓存一致性管理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆