为什么CUDA固定内存如此之快? [英] Why is CUDA pinned memory so fast?

查看:492
本文介绍了为什么CUDA固定内存如此之快?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我观察的数据传输速度提升显着,当我使用固定内存的CUDA数据传输。在Linux上,底层的系统调用实现这一目标是MLOCK。从MLOCK的手册页,它指出锁定页面$ P $被换出pvents是:

I observe substantial speedups in data transfer when I use pinned memory for CUDA data transfers. On linux, the underlying system call for achieving this is mlock. From the man page of mlock, it states that locking the page prevents it from being swapped out:

MLOCK()锁定在地址范围内开始的页面地址,继续len个字节。包含指定地址范围的一部分,所有页面都保证驻留在内存的时候调用成功返回;

mlock() locks pages in the address range starting at addr and continuing for len bytes. All pages that contain a part of the specified address range are guaranteed to be resident in RAM when the call returns successfully;

在我的测试中,我有空闲内存饥荒预警系统演出我的系统上,因此从来就没有了内存页面可能已经被换出,但我仍然观察到加速的风险。谁能解释一下到底发生了什么就在这里?任何见解或信息是非常AP preciated。

In my tests, I had a fews gigs of free memory on my system so there was never any risk that the memory pages could've been swapped out yet I still observed the speedup. Can anyone explain what's really going on here?, any insight or info is much appreciated.

推荐答案

CUDA驱动程序的检查后,如果内存范围被锁定或没有,然后它会使用不同的codePATH。锁定内存存储在物理内存(RAM),使设备能够W / O从CPU帮忙去取(DMA,又名异步复制;设备只需要物理页列表)。不锁定内存可以生成访问的页面错误,它存储不仅在内存中(例如,它可以在交换),因此驱动程序需要访问的非锁定内存的每一页,将它复制到固定缓冲区,并把它传递以DMA(Syncronious,页逐页复印)。

CUDA Driver checks, if the memory range is locked or not and then it will use a different codepath. Locked memory is stored in the physical memory (RAM), so device can fetch it w/o help from CPU (DMA, aka Async copy; device only need list of physical pages). Not-locked memory can generate a page fault on access, and it is stored not only in memory (e.g. it can be in swap), so driver need to access every page of non-locked memory, copy it into pinned buffer and pass it to DMA (Syncronious, page-by-page copy).

如这里所描述<一href=\"http://forums.nvidia.com/index.php?showtopic=164661\">http://forums.nvidia.com/index.php?showtopic=164661

主机内存需要通过cudaMallocHost或cudaHostAlloc锁定页。

host memory used by the asynchronous mem copy call needs to be page locked through cudaMallocHost or cudaHostAlloc.

我也可以推荐在developer.download.nvidia.com检查cudaMemcpyAsync和cudaHostAlloc手册。 HostAlloc说,CUDA驱动程序检测到固定的内存:

I can also recommend to check cudaMemcpyAsync and cudaHostAlloc manuals at developer.download.nvidia.com. HostAlloc says that cuda driver can detect pinned memory:

驱动程序跟踪与此(cudaHostAlloc)函数分配的虚拟内存的范围和自动加速,如cudaMemcpy()。

The driver tracks the virtual memory ranges allocated with this(cudaHostAlloc) function and automatically accelerates calls to functions such as cudaMemcpy().

这篇关于为什么CUDA固定内存如此之快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆