ioread的延迟 [英] Latency in ioread

查看:97
本文介绍了ioread的延迟的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设您有一个PCIE设备,该设备显示了一个pAR和一个用pci_alloc_consistent(..)声明的DMA区域。 BAR的标志指示不可预取,不可缓存的内存区域。



什么是导致读取DMA区域延迟的原因,同样,这些原因是什么?

感谢您回答这个简单的问题:D!

解决方案

这闻起来有点像作业,但我怀疑很多人对这些概念还不甚了解,因此我将添加一个答案。



仔细考虑一下就是要考虑为了完成阅读需要发生什么。 CPU和设备位于PCIe链接的不同侧。将PCI-Express视为一个小型网络会很有帮助。每个链接都是点对点的(例如您的PC连接到另一台PC)。也可能有中间交换机(PCI中的桥接器)。在这种情况下,就像您的PC连接到了一个交换机,而该交换机又又连接到另一台PC。



因此,如果CPU要读取自己的内存( (您分配的 DMA区域),它相对较快。它具有高速总线,旨在使此过程迅速发生。而且,内置了多层缓存,以使频繁(或最近使用)的数据接近 CPU。



但是,如果CPU要从设备中的BAR读取,则CPU(实际上是与CPU集成的PCIe根联合体)必须组成PCIe读取请求,然后发送请求,然后等待设备解码请求,访问BAR位置并发回请求的数据。滴答滴答声。您的CPU等待完成时,没有执行其他操作。



这与从另一台计算机上请求网页非常相似。您制定一个HTTP请求,发送并等待Web服务器访问内容,制定返回数据包并将其发送给您。



如果设备希望访问内存驻留在 CPU中,反之则几乎完全相同。 (直接内存访问只是意味着它不需要中断CPU来处理它,但是[这里的根联合体]仍然负责解码请求,完成读取并发回结果数据。) / p>

此外,如果CPU和设备之间存在中间的PCIe开关,则这些开关可能会增加额外的缓冲/排队延迟(就像网络中的开关或路由器一样)。



当然,PCIe速度非常快,所以所有这些延迟都在十亿分之一秒之内发生,但这仍然是命令。幅度比本地读取慢。


Suppose you have a PCIE device presenting a single BAR and one DMA area declared with pci_alloc_consistent(..). The BAR's flags indicate non-prefetchable, non-cacheable, memory region.

What are the principle causes for latency in reading the DMA area, and similarly, what are the causes of latency reading the BAR?

Thank you for answering this simple question :D!

解决方案

This smells a bit like homework but I suspect the concepts are not well understood by many so I'll add an answer.

The best way to think through this is to consider what needs to happen in order for a read to complete. The CPU and the device are on separate sides of the PCIe link. It's helpful to view PCI-Express as a mini network. Each link is point-to-point (like your PC connected to another PC). There may also be intermediate switches (aka bridges in PCI). In that case, it's like your PC is connected to a switch that is in turn connected to the other PC.

So, if the CPU wants to read its own memory (the "DMA" region you allocated), it's relatively fast. It has a high speed bus that is designed to make that happen fast. Also, there are multiple layers of caching built in to keep frequently (or recently) used data "close" to the CPU.

But if the CPU wants to read from the BAR in the device, the CPU (actually the PCIe root complex integrated with the CPU) must compose a PCIe read request, send the request, and wait while the device decodes the request, accesses the BAR location and sends back the requested data. Tick tock. Your CPU is doing nothing else while it waits for this to complete.

This is pretty much analogous to asking for a web page from another computer. You formulate an HTTP request, send it and wait while the web server accesses the content, formulates a return packet and sends it to you.

If the device wishes to access memory residing "in" the CPU, it's pretty much the exact same thing in reverse. ("Direct memory access" just means that it doesn't need to interrupt the CPU to handle it, but something [the root complex here] is still responsible for decoding the request, fulfilling the read and sending back the resulting data.)

Also, if there are intermediate PCIe switches between CPU and device, those may add additional buffering/queuing delays (exactly as a switch or router might in a network). And any such delays are doubled since they're incurred in both directions.

Of course PCIe is very fast, so all of that happens in mere nanoseconds, but that's still orders of magnitude slower than a "local" read.

这篇关于ioread的延迟的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆