基本CUDA - 使用C ++在内核上运行内核 [英] Basic CUDA - getting kernels to run on the device using C++

查看：214 发布时间：2017/3/5 19:07:42 cuda

本文介绍了基本CUDA - 使用C ++在内核上运行内核的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我刚接触CUDA&试图获得一个基本的内核在设备上运行。我已经编制的例子&然后运行所以我知道设备驱动程序工作/ CUDA可以成功运行。我的目标是让我的C ++代码调用CADU，大大加快了一个任务。我一直在阅读一堆不同的帖子在线，如何做到这一点。具体来说，[here]：我可以在C ++中调用cuda函数调用吗？ / a>。

I'm new to CUDA & trying to get a basic kernel to run on the device. I have compiled the examples & then run so I know the device drivers work/CUDA can run successfully. My goal is to get my C++ code to call CADU to greatly speed up a task. I've been reading over a bunch of different posts online about how to do this. Specifically, [here]: Can I call cuda function calls in C++?.

我的问题很简单（包括所有）运行我的代码（发布下面）我没有错误，但内核不显示运行。这应该是微不足道的修复，但6小时后，我失去了。我会发布在NVIDIA论坛，但他们仍然失败：/。我相信答案是非常基本的 - 任何帮助？下面是我的代码，我怎么编译它，&终端输出我看到：

My question is very simple (embracingly so) when I compile & run my code (posted below) I get no errrors but the kernel does not appear to run. This should be trivial to fix but after 6 hours I'm at a loss. I'd post this on the NVIDIA forums but they're still down :/. I'm sure the answer is very basic - any help? Below is: my code, how I compile it, & the terminal outputs I see:

main.cpp

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
extern void kernel_wrapper(int *a, int *b);

int main(int argc, char *argv[]){
int a = 2;
int b = 3;

printf("Input: a = %d, b = %d\n",a,b);
kernel_wrapper(&a, &b);
printf("Ran: a = %d, b = %d\n",a,b);
return 0;
}

kernel.cu

#include "cuPrintf.cu"
#include <stdio.h>
__global__ void kernel(int *a, int *b){
int tx = threadIdx.x;
cuPrintf("tx = %d\n", tx);
switch( tx ){
  case 0:
    *a = *a + 10;
    break;
  case 1:
    *b = *b + 3;
    break;
  default:
    break;
  }
}

void kernel_wrapper(int *a, int *b){
  cudaPrintfInit();
  //cuPrintf("Anything...?");
  printf("Anything...?\n");
  int *d_1, *d_2;
  dim3 threads( 2, 1 );
  dim3 blocks( 1, 1 );

  cudaMalloc( (void **)&d_1, sizeof(int) );
  cudaMalloc( (void **)&d_2, sizeof(int) );

  cudaMemcpy( d_1, a, sizeof(int), cudaMemcpyHostToDevice );
  cudaMemcpy( d_2, b, sizeof(int), cudaMemcpyHostToDevice );

  kernel<<< blocks, threads >>>( a, b );
  cudaMemcpy( a, d_1, sizeof(int), cudaMemcpyDeviceToHost );
  cudaMemcpy( b, d_2, sizeof(int), cudaMemcpyDeviceToHost );
  printf("Output: a = %d\n", a[0]);
  cudaFree(d_1);
  cudaFree(d_2);

  cudaPrintfDisplay(stdout, true);
  cudaPrintfEnd();
}

我使用命令从终端编译上述代码：

I compile the above code from the terminal using the commands:

g++ -c main.cpp
nvcc -c kernel.cu -I/home/clj/NVIDIA_GPU_Computing_SDK/C/src/simplePrintf
nvcc -o main main.o kernel.o

当我运行代码以下终端输出：

When I run the code I get the following terminal output:

$./main
Input: a = 2, b = 3
Anything...?
Output: a = 2
Ran: a = 2, b = 3

很明显main.cpp正在被正确编译&调用kernel.cu代码。明显的问题是内核似乎没有运行。我相信这个答案是基本的 - 非常非常基础。但是我不知道发生了什么 - 请帮助？

It's clear that the main.cpp is being compiled correctly & calling the kernel.cu code. The obvious problem is that the kernel does not appear to run. I'm sure the answer to this is basic - VERY VERY BASIC. But I don't know what's happening - help please?

基本CUDA - 使用C ++在内核上运行内核 [英] Basic CUDA - getting kernels to run on the device using C++

问题描述

推荐答案

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录关闭

基本CUDA - 使用C ++在内核上运行内核 [英] Basic CUDA - getting kernels to run on the device using C++

问题描述

推荐答案

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录 关闭

登录关闭