如何使向量类型值到CUDA中的固定内存 [英] How to make vector-type-value to pinned memory in cuda
问题描述
我对制作固定内存有疑问。
现在我正在使用CUDA处理大量数据。
为减少运行时间,我认为有必要使内存复制和内核启动重叠。
在搜索了一些文本和网页之后对于重叠的内存复制和内核启动,我注意到有必要使用cudaMallocHost分配主机内存,它将将主机内存分配给固定的内存。
在使用整数或数组类型的情况下
就像这样...
cudaStream_t * stream =(cudaStream_t *)malloc(MAX_num_stream * sizeof(cudaStream_t));
for(i = 0; i< MAX_num_stream; i ++)
cudaStreamCreate(&(streams [i]));
cudaMallocHost(&出发,its_size);
for(n = 1; ...; n ++){
cudaMemcpyAsync(... stream [n]);
内核<<< ...,...,...,streams [n]>> (...);
}
但是对于我来说,我的主机离开内存是由vertor类型设置的。 / p>
我找不到任何方法可以使用cudaMallocHost将向量类型主机内存转换为固定内存。
请帮助我或提供建议以解决此问题。
感谢您阅读我的英语不好。谢谢。
直接,您不能使用 cudaMallocHost 为其他POD类型分配内存。 code>。
如果您确实需要使用固定内存的 std :: vector
,将必须实现您自己的 std :: allocator
模型,该模型在内部调用 cudaMallocHost
并实例化您的 std :: vector
使用该自定义分配器。
或者,推力模板库(已随附)在最新发布的CUDA工具包中)包括一个实验性的固定内存分配器,您可以将其与推力自己的向量类一起使用,它本身就是 std :: vector
的模型。
I have questions about making pinned memory.
Now I'm using CUDA to deal with great size of data.
To reduce run-time, I figure out it is necessary to make memory-copy and kernel-launch overlapped.
After searching some texts and web pages, to overlapping memory-copy and kernel-launch, I notice it is necessary to allocate host memory by using cudaMallocHost which will allocates host-memory to pinned memory.
In the case of using integer or array type on host, it was easy to make pinned memory.
Just like this...
cudaStream_t* streams = (cudaStream_t*)malloc(MAX_num_stream * sizeof(cudaStream_t));
for(i=0; i<MAX_num_stream; i++)
cudaStreamCreate(&(streams[i]));
cudaMallocHost(&departure, its_size);
for(n=1; ... ; n++){
cudaMemcpyAsync( ... streams[n]);
kernel <<< ... , ... , ... , streams[n] >>> (...);
}
But in my case, my host departure memory is set by vertor type.
And I can't find anywhere the way to turn vector-type-host-memory into pinned memory by using cudaMallocHost.
Help me or give some advice to solve this problem. Thanks you for reading my poor English. Thanks.
Directly, you can't allocate memory for anything other POD types using cudaMallocHost
.
If you really need a std::vector
which uses pinned memory, you will have to implement your own model of std::allocator
which calls cudaMallocHost
internally and instantiate your std::vector
using that custom allocator.
Alternatively, the thrust template library (which ships in recent releases of CUDA toolkit) includes an experimental pinned memory allocator which you could use with thrusts own vector class, which is iteself a model of std::vector
.
这篇关于如何使向量类型值到CUDA中的固定内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!