通过主机和设备之间的变量CUDA [英] Passing variables between host and device in CUDA

查看:140
本文介绍了通过主机和设备之间的变量CUDA的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下CUDA内核,做了广度优先搜索。

I've the following CUDA kernel, that does a Breadth First Search.

__global__ void bfs(const Edge* edges, int* vertices, int* current_depth, bool* done){

    int e = blockDim.x * blockIdx.x + threadIdx.x;
    int vfirst = edges[e].first;
    int dfirst = vertices[vfirst];
    int vsecond = edges[e].second;
    int dsecond = vertices[vsecond];

    if((dfirst == *current_depth) && (dsecond == -1)){
        vertices[vsecond] = dfirst +1;
        *current_depth = dfirst+1;
        *done = false;
    }
    if((dsecond == *current_depth) && (dfirst == -1)){
        vertices[vfirst] = dsecond + 1;
        *current_depth = dsecond +1;
        *done = false;
    }
}

这个内核负责被分配的主机上,然后修改该设备上并写回入宿主值。

This kernel takes values which are assigned on the host and then modified on the device and written back into the host.

所以,我已经声明了两个变量,他们用这种方式复制到设备

So I've declared the two variables and copied them to the device in this way

bool h_done = true;
    bool* d_done;
    int* d_current_depth;
    int h_current_depth = 0;

    cudaMalloc((void**)&d_done, sizeof(bool));
    cudaMalloc((void**)&d_current_depth, sizeof(int));
    cudaMemcpy(d_done, &h_done, sizeof(bool), cudaMemcpyHostToDevice);
    cudaMemcpy(d_current_depth, &h_current_depth, sizeof(int), cudaMemcpyHostDevice);

和启动内核这里一个循环。

And launch the kernel in a loop here.

bfs<<<blocksPerGrid, threadsPerBlock>>>(h_edges, h_vertices, d_current_depth, d_done);

在code编译并运行正常,但主机值永远不会修改的设备上,反之亦然。我已经通过详细NVIDIA样品code走了,但似乎无法得到这个权利。我是新来的CUDA。任何帮助AP preciated。

The code compiles and runs fine but the host values never get modified on the device and vice versa. I've gone through the NVIDIA sample code in detail but can't seem to get this right. I'm new to CUDA. Any help appreciated.

推荐答案

bfs<<<blocksPerGrid, threadsPerBlock>>>(h_edges, h_vertices, d_current_depth, d_done);

几乎肯定是错误的。

is almost certainly wrong.

除非你使用托管内存(我怀疑), h_edges h_vertices 的(他们的名字去在主机的内存)变量。你可以不通过和修改设备code普通的主机指针。你的内核很可能失败,因为这个错误的运行。

Unless you are using managed memory (which I doubt), h_edges and h_vertices are (going by their names) variables in host memory. You cannot pass and modify regular host pointers in device code. Your kernel is likely failing to run because of this mistake.

您code是报告未指定发布错误很可能是由这个引起的。

The unspecified launch error your code is reporting is most likely caused by this.

这篇关于通过主机和设备之间的变量CUDA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆