如预期的简单CUDA内核没有返回值 [英] Simple CUDA kernel not returning values as expected

查看:552
本文介绍了如预期的简单CUDA内核没有返回值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以,我开始获得与CUDA,我决定写code我可以最简单的一块,只是为了让我的轴承很沮丧。但有些事情似乎是想就在我的头上。在我的code,我只是增加了两个数组,然后将它们存储在第三阵列,像这样的:

So, I'm starting to get so frustrated with CUDA that I decided to write the simplest piece of code I could, just to get my bearings. But something seems to be going right over my head. In my code, I'm just adding two arrays, and then storing them in a third array, like this:

#include <stdio.h>
#include <stdlib.h>

__global__ void add(int* these, int* those, int* answers)
{
    int tid = blockIdx.x;
    answers[tid] = these[tid] + those[tid];
}

int main()
{
    int these[50];
    int those[50];
    int answers[50];

    int *devthese;
    int *devthose;
    int *devanswers;

    cudaMalloc((void**)&devthese, 50 * sizeof(int));
    cudaMalloc((void**)&devthose, 50 * sizeof(int));
    cudaMalloc((void**)&devanswers, 50 * sizeof(int));


    int i;
    for(i = 0; i < 50; i++)
    {
        these[i] = i;
        those[i] = 2 * i;
    }

    cudaMemcpy(devthese, these, 50 * sizeof(int), cudaMemcpyHostToDevice);
    cudaMemcpy(devthose, those, 50 * sizeof(int), cudaMemcpyHostToDevice);
    add<<<50,1>>>(devthese, devthose, devanswers);

    cudaMemcpy(answers, devanswers, 50 * sizeof(int), cudaMemcpyDeviceToHost);
    for(i = 0; i < 50; i++)
    {
        fprintf(stderr,"%i\n",answers[i]);
    }
    return 0;
}

然而,正在打印中的int值不继3的倍数,这正是我期待的顺序。任何人能解释这是怎么回事了?

However, the int values that are being printed out aren't following the sequence of multiples of 3, which is what I was expecting. Can anyone explain what is going wrong?

推荐答案

从意见,问题显然与编译过程中使用了不正确的目标架构,导致其不能在OP的GPU上运行的可执行文件。

From comments, the problem was apparently related to using the incorrect target architecture during compilation, leading to an executable which could not run on the OP's GPU.

这社区维基答案已经被添加到不了这一关的悬而未决队列。可如果/当运回来,并提供一个更全面的答案被删除。

This community wiki answer has been added to get this off the unanswered queue. It can be deleted if/when the OP comes back and provides a fuller answer.

这篇关于如预期的简单CUDA内核没有返回值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆