默认固定内存与零复制内存 [英] Default Pinned Memory Vs Zero-Copy Memory

查看:209
本文介绍了默认固定内存与零复制内存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在CUDA中,我们可以使用固定内存将数据从主机更有效地复制到GPU,而不是通过主机上的 malloc 分配的默认内存。但是,有两种类型的固定内存:默认固定内存零复制固定内存



默认固定内存将数据从主机复制到GPU的速度是正常传输速度的两倍,因此这绝对是一个优势(假设我们有足够的主机内存来锁定页面)。



在不同版本的固定存储器中,即零拷贝存储器,我们不需要将数据从主机复制到GPU的DRAM。内核直接从主机内存中读取数据。



我的问题是:这些pinned内存类型是更好的编程习惯。

解决方案

我认为这取决于您的申请(否则,他们为什么会提供这两种方式?)



映射,固定内存(零拷贝)在以下情况时很有用:




  • 没有内存,并且仍然使用RAM


  • 您只需加载一次数据,但是有很多计算要执行,


  • 主机端想要更改/添加更多数据或读取结果,而内核仍在运行(例如通信) / p>



  • p>请注意,您还可以使用多个流复制数据并并行运行内核。



    固定,但未映射的内存更好:




    • 多次加载或存储数据时。例如:您有多个后续内核,按步骤执行工作 - 每次都不需要从主机加载数据。


    • 很多计算执行和加载延迟都不会被隐藏。



    In CUDA we can use pinned memory to more efficiently copy the data from Host to GPU than the default memory allocated via malloc at host. However there are two types of pinned memories the default pinned memory and the zero-copy pinned memory.

    The default pinned memory copies the data from Host to GPU twice as fast as the normal transfers, so there's definitely an advantage (provided we have enough host memory to page-lock)

    In the different version of pinned memory, i.e. zero-copy memory, we don't need to copy the data from host to GPU's DRAM altogether. The kernels read the data directly from the Host memory.

    My question is: Which of these pinned-memory type is a better programming practice.

    解决方案

    I think it depends on your application (otherwise, why would they provide both ways?)

    Mapped, pinned memory (zero-copy) is useful when either:

    • The GPU has no memory on its own and uses RAM anyway

    • You load the data exactly once, but you have a lot of computation to perform on it and you want to hide memory transfer latencies through it.

    • The host side wants to change/add more data, or read the results, while kernel is still running (e.g. communication)

    • The data does not fit into GPU memory

    Note that, you can also use multiple streams to copy data and run kernels in parallel.

    Pinned, but not mapped memory is better:

    • When you load or store the data multiple times. For example: you have multiple subsequent kernels, performing the work in steps - there is no need to load the data from host every time.

    • There is not that much computation to perform and loading latencies are not going to be hidden well

    这篇关于默认固定内存与零复制内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆