CUDA - 在PCI-E上传输多少速度? [英] CUDA - how much slower is transferring over PCI-E?

查看:437
本文介绍了CUDA - 在PCI-E上传输多少速度?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我将一个字节从CUDA内核传输到PCI-E到主机(零拷贝存储器),与传输200 MB这样的内容相比,它有多慢?

If I transfer a single byte from a CUDA kernel to PCI-E to the host (zero-copy memory), how much is it slow compared to transferring something like 200 Megabytes?

我想知道,因为我知道,对于CUDA内核,通过PCI-E传输是慢的,是:如果我只传输一个字节或大量的数据,它会改变任何东西吗?或者由于存储器传输是在大块中执行的,因此传输单个字节对于传输200 MB是非常昂贵和无用的。

What I would like to know, since I know that transferring over PCI-E is slow for a CUDA kernel, is: does it change anything if I transfer just a single byte or a huge amount of data? Or perhaps since memory transfers are performed in "bulks", transferring a single byte is extremely expensive and useless with respect to transferring 200 MBs?

推荐答案

希望这个pic解释一切。数据是通过CUDA示例中的 bandwidthTest 生成的。硬件环境为PCI-E v2.0,Tesla M2090和2x Xeon E5-2609。请注意,两个轴都是对数标度。

Hope this pic explain everything. The data is generated by bandwidthTest in CUDA samples. The hardware environment is PCI-E v2.0, Tesla M2090 and 2x Xeon E5-2609. Please note both axises are in log scale.

根据这个数字,我们可以看到启动传输请求的开销需要一个固定的时间。数据的回归分析给出了H2D的4.9us的估计开销时间,D2H的3.3us和D2D的3.0us。

Given this figure, we can see that the overhead of launching a transfer request takes a constant time. Regression analysis on the data gives a estimated overhead time of 4.9us for H2D, 3.3us for D2H and 3.0us for D2D.

这篇关于CUDA - 在PCI-E上传输多少速度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆