基本CUDA - 使用C ++在内核上运行内核 [英] Basic CUDA - getting kernels to run on the device using C++

查看:214
本文介绍了基本CUDA - 使用C ++在内核上运行内核的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刚接触CUDA&试图获得一个基本的内核在设备上运行。我已经编制的例子&然后运行所以我知道设备驱动程序工作/ CUDA可以成功运行。我的目标是让我的C ++代码调用CADU,大大加快了一个任务。我一直在阅读一堆不同的帖子在线,如何做到这一点。具体来说,[here]:我可以在C ++中调用cuda函数调用吗? / a>。

I'm new to CUDA & trying to get a basic kernel to run on the device. I have compiled the examples & then run so I know the device drivers work/CUDA can run successfully. My goal is to get my C++ code to call CADU to greatly speed up a task. I've been reading over a bunch of different posts online about how to do this. Specifically, [here]: Can I call cuda function calls in C++?.

我的问题很简单(包括所有)运行我的代码(发布下面)我没有错误,但内核不显示运行。这应该是微不足道的修复,但6小时后,我失去了。我会发布在NVIDIA论坛,但他们仍然失败:/。我相信答案是非常基本的 - 任何帮助?下面是我的代码,我怎么编译它,&终端输出我看到:

My question is very simple (embracingly so) when I compile & run my code (posted below) I get no errrors but the kernel does not appear to run. This should be trivial to fix but after 6 hours I'm at a loss. I'd post this on the NVIDIA forums but they're still down :/. I'm sure the answer is very basic - any help? Below is: my code, how I compile it, & the terminal outputs I see:

main.cpp

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
extern void kernel_wrapper(int *a, int *b);

int main(int argc, char *argv[]){
int a = 2;
int b = 3;

printf("Input: a = %d, b = %d\n",a,b);
kernel_wrapper(&a, &b);
printf("Ran: a = %d, b = %d\n",a,b);
return 0;
}

kernel.cu

kernel.cu

#include "cuPrintf.cu"
#include <stdio.h>
__global__ void kernel(int *a, int *b){
int tx = threadIdx.x;
cuPrintf("tx = %d\n", tx);
switch( tx ){
  case 0:
    *a = *a + 10;
    break;
  case 1:
    *b = *b + 3;
    break;
  default:
    break;
  }
}

void kernel_wrapper(int *a, int *b){
  cudaPrintfInit();
  //cuPrintf("Anything...?");
  printf("Anything...?\n");
  int *d_1, *d_2;
  dim3 threads( 2, 1 );
  dim3 blocks( 1, 1 );

  cudaMalloc( (void **)&d_1, sizeof(int) );
  cudaMalloc( (void **)&d_2, sizeof(int) );

  cudaMemcpy( d_1, a, sizeof(int), cudaMemcpyHostToDevice );
  cudaMemcpy( d_2, b, sizeof(int), cudaMemcpyHostToDevice );

  kernel<<< blocks, threads >>>( a, b );
  cudaMemcpy( a, d_1, sizeof(int), cudaMemcpyDeviceToHost );
  cudaMemcpy( b, d_2, sizeof(int), cudaMemcpyDeviceToHost );
  printf("Output: a = %d\n", a[0]);
  cudaFree(d_1);
  cudaFree(d_2);

  cudaPrintfDisplay(stdout, true);
  cudaPrintfEnd();
}

我使用命令从终端编译上述代码:

I compile the above code from the terminal using the commands:

g++ -c main.cpp
nvcc -c kernel.cu -I/home/clj/NVIDIA_GPU_Computing_SDK/C/src/simplePrintf
nvcc -o main main.o kernel.o

当我运行代码以下终端输出:

When I run the code I get the following terminal output:

$./main
Input: a = 2, b = 3
Anything...?
Output: a = 2
Ran: a = 2, b = 3

很明显main.cpp正在被正确编译&调用kernel.cu代码。明显的问题是内核似乎没有运行。我相信这个答案是基本的 - 非常非常基础。但是我不知道发生了什么 - 请帮助?

It's clear that the main.cpp is being compiled correctly & calling the kernel.cu code. The obvious problem is that the kernel does not appear to run. I'm sure the answer to this is basic - VERY VERY BASIC. But I don't know what's happening - help please?

推荐答案

在kernel_wrapper里面你有以下调用:

Inside kernel_wrapper you have the following call:

kernel<<< blocks, threads >>>( a, b );

你正在做的是传递给主机上变量的引用。 GPU不能对它们进行操作。传递的值必须存在于GPU上。基本上传递d_1和d_2将解决问题,结果将是a = 12和b = 6。

What you are doing is that you are passing to it the references to the variables that live on the host. The GPU cannot operate on them. The passed values have to live on the GPU. Basically passing d_1 and d_2 will solve the problem and the result will be a = 12 and b = 6.

kernel<<< blocks, threads >>>( d_1, d_2 );

这篇关于基本CUDA - 使用C ++在内核上运行内核的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆