CUDA - 通过 PCI-E 传输的速度有多慢? [英] CUDA - how much slower is transferring over PCI-E?

查看:24
本文介绍了CUDA - 通过 PCI-E 传输的速度有多慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我将单个字节从 CUDA 内核传输到 PCI-E 到主机(零拷贝内存),与传输 200 兆字节的数据相比,它的速度要慢多少?

If I transfer a single byte from a CUDA kernel to PCI-E to the host (zero-copy memory), how much is it slow compared to transferring something like 200 Megabytes?

我想知道,因为我知道通过 PCI-E 传输对于 CUDA 内核来说很慢,所以我想知道的是:如果我只传输一个字节或大量数据,它会改变什么吗?或者可能由于内存传输是批量"执行的,传输单个字节相对于传输 200 MB 而言非常昂贵且无用?

What I would like to know, since I know that transferring over PCI-E is slow for a CUDA kernel, is: does it change anything if I transfer just a single byte or a huge amount of data? Or perhaps since memory transfers are performed in "bulks", transferring a single byte is extremely expensive and useless with respect to transferring 200 MBs?

推荐答案

希望这张图片能解释一切.数据由 CUDA 样本中的 bandwidthTest 生成.硬件环境为 PCI-E v2.0、Tesla M2090 和 2x Xeon E5-2609.请注意,两个轴都是对数刻度.

Hope this pic explain everything. The data is generated by bandwidthTest in CUDA samples. The hardware environment is PCI-E v2.0, Tesla M2090 and 2x Xeon E5-2609. Please note both axises are in log scale.

鉴于此图,我们可以看到启动传输请求的开销需要一个恒定的时间.对数据的回归分析得出 H2D 的估计开销时间为 4.9us,D2H 为 3.3us,D2D 为 3.0us.

Given this figure, we can see that the overhead of launching a transfer request takes a constant time. Regression analysis on the data gives an estimated overhead time of 4.9us for H2D, 3.3us for D2H and 3.0us for D2D.

这篇关于CUDA - 通过 PCI-E 传输的速度有多慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆