Cuda:固定内存零复制问题 [英] Cuda: pinned memory zero copy problems

查看:163
本文介绍了Cuda:固定内存零复制问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试过此链接中的代码是否是CUDA固定内存零拷贝?
请求声明该程序对他工作正常的人
但是在我的
上不工作,如果我在内核中操作它们的值不会改变。

I tried the code in this link Is CUDA pinned memory zero-copy? The one who asked claims the program worked fine for him But does not work the same way on mine the values does not change if I manipulate them in the kernel.

基本上我的问题是,我的GPU内存不够,但我想做计算,需要更多的内存。我的程序使用RAM内存,或主机内存,并能够使用CUDA进行计算。链接中的程序似乎解决了我的问题,但是代码没有给出输出显示的家伙。

Basically my problem is, my GPU memory is not enough but I want to do calculations which require more memory. I my program to use RAM memory, or host memory and be able to use CUDA for calculations. The program in the link seemed to solve my problem but the code does not give output as shown by the guy.

零复制内存上的任何帮助或任何工作示例将是有用。

Any help or any working example on Zero copy memory would be useful.

谢谢

__global__ void testPinnedMemory(double * mem)
{
double currentValue = mem[threadIdx.x];
printf("Thread id: %d, memory content: %f\n", threadIdx.x, currentValue);
mem[threadIdx.x] = currentValue+10;
}

void test() 
{
const size_t THREADS = 8;
double * pinnedHostPtr;
cudaHostAlloc((void **)&pinnedHostPtr, THREADS, cudaHostAllocDefault);

//set memory values
for (size_t i = 0; i < THREADS; ++i)
    pinnedHostPtr[i] = i;

//call kernel
dim3 threadsPerBlock(THREADS);
dim3 numBlocks(1);
testPinnedMemory<<< numBlocks, threadsPerBlock>>>(pinnedHostPtr);

//read output
printf("Data after kernel execution: ");
for (int i = 0; i < THREADS; ++i)
    printf("%f ", pinnedHostPtr[i]);    
printf("\n");
}


推荐答案

ZeroCopy 内存,您必须将 cudaHostAllocMapped 标志指定为 cudaHostAlloc p>

First of all, to allocate ZeroCopy memory, you have to specify cudaHostAllocMapped flag as an argument to cudaHostAlloc.

cudaHostAlloc((void **)&pinnedHostPtr, THREADS * sizeof(double), cudaHostAllocMapped);

仍然使用 pinnedHostPointer 映射的内存只能从主机端。要从设备访问相同的内存,您必须得到设备侧指针到内存,如下:

Still the pinnedHostPointer will be used to access the mapped memory from the host side only. To access the same memory from device, you have to get the device side pointer to the memory like this:

double* dPtr;
cudaHostGetDevicePointer(&dPtr, pinnedHostPtr, 0);

将此指针作为内核参数传递。

Pass this pointer as kernel argument.

testPinnedMemory<<< numBlocks, threadsPerBlock>>>(dPtr);

此外,您必须将内核执行与主机同步以读取更新的值。只需在内核调用后添加 cudaDeviceSynchronize

Also, you have to synchronize the kernel execution with the host to read the updated values. Just add cudaDeviceSynchronize after the kernel call.

链接问题中的代码正在工作,问题是在具有Compute Capability 2.0的GPU和启用TCC的64位操作系统上运行代码。此配置会自动启用GPU的统一虚拟寻址功能,其中设备将主机+设备内存视为单个大内存,而不是单独的内存,而使用 cudaHostAlloc 分配的主机指针可以直接传递到内核。

The code in the linked question is working, because the person who asked the question is running the code on a 64 bit OS with a GPU of Compute Capability 2.0 and TCC enabled. This configuration automatically enables the Unified Virtual Addressing feature of the GPU in which the device sees host + device memory as a single large memory instead of separate ones and host pointers allocated using cudaHostAlloc can be passed directly to the kernel.

在你的情况下,最终代码将如下所示:

In your case, the final code will look like this:

#include <cstdio>

__global__ void testPinnedMemory(double * mem)
{
    double currentValue = mem[threadIdx.x];
    printf("Thread id: %d, memory content: %f\n", threadIdx.x, currentValue);
    mem[threadIdx.x] = currentValue+10;
}

int main() 
{
    const size_t THREADS = 8;
    double * pinnedHostPtr;
    cudaHostAlloc((void **)&pinnedHostPtr, THREADS * sizeof(double), cudaHostAllocMapped);

    //set memory values
    for (size_t i = 0; i < THREADS; ++i)
        pinnedHostPtr[i] = i;

    double* dPtr;
    cudaHostGetDevicePointer(&dPtr, pinnedHostPtr, 0);

    //call kernel
    dim3 threadsPerBlock(THREADS);
    dim3 numBlocks(1);
    testPinnedMemory<<< numBlocks, threadsPerBlock>>>(dPtr);
    cudaDeviceSynchronize();

    //read output
    printf("Data after kernel execution: ");
    for (int i = 0; i < THREADS; ++i)
        printf("%f ", pinnedHostPtr[i]);    
    printf("\n");

    return 0;
}

这篇关于Cuda:固定内存零复制问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆