CUDA - 在PCI-E上传输多少速度? [英] CUDA - how much slower is transferring over PCI-E?
问题描述
如果我将一个字节从CUDA内核传输到PCI-E到主机(零拷贝存储器),与传输200 MB这样的内容相比,它有多慢?
If I transfer a single byte from a CUDA kernel to PCI-E to the host (zero-copy memory), how much is it slow compared to transferring something like 200 Megabytes?
我想知道,因为我知道,对于CUDA内核,通过PCI-E传输是慢的,是:如果我只传输一个字节或大量的数据,它会改变任何东西吗?或者由于存储器传输是在大块中执行的,因此传输单个字节对于传输200 MB是非常昂贵和无用的。
What I would like to know, since I know that transferring over PCI-E is slow for a CUDA kernel, is: does it change anything if I transfer just a single byte or a huge amount of data? Or perhaps since memory transfers are performed in "bulks", transferring a single byte is extremely expensive and useless with respect to transferring 200 MBs?
推荐答案
希望这个pic解释一切。数据是通过CUDA示例中的 bandwidthTest 生成的。硬件环境为PCI-E v2.0,Tesla M2090和2x Xeon E5-2609。请注意,两个轴都是对数标度。
Hope this pic explain everything. The data is generated by bandwidthTest in CUDA samples. The hardware environment is PCI-E v2.0, Tesla M2090 and 2x Xeon E5-2609. Please note both axises are in log scale.
根据这个数字,我们可以看到启动传输请求的开销需要一个固定的时间。数据的回归分析给出了H2D的4.9us的估计开销时间,D2H的3.3us和D2D的3.0us。
Given this figure, we can see that the overhead of launching a transfer request takes a constant time. Regression analysis on the data gives a estimated overhead time of 4.9us for H2D, 3.3us for D2H and 3.0us for D2D.
这篇关于CUDA - 在PCI-E上传输多少速度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!