我无法理解CUDA文档以便在CUDA内核中使用math.h函数 [英] I cannot understand the CUDA documentation in order to use math.h functions in CUDA kernels

查看:472
本文介绍了我无法理解CUDA文档以便在CUDA内核中使用math.h函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图了解如何使用CUDA库中的数学函数。
我使用以下文档: https://docs.nvidia.com/ cuda / cuda-math-api /

I am trying to understand how to use the math functions from the CUDA library. I use this documentation: https://docs.nvidia.com/cuda/cuda-math-api/

我将描述我的问题,但是我认为可以使用CUDA库中的任何函数将其推广。

I am going to describe my problem, but I think this can be generalized with any function from the CUDA library.

我有这段代码:

   double diff[(Ni+2)*(Nj+2)];

   .
   .
   .

   for (i=1; i<=Ni; i++){
        for (j=1; j<=Nj; j++){
            diff[i*(Nj+2) + j] = fabs(value1[i*(Nj+2) + j] - value2[i*(Nj+2) + j]);
        }
    }

在我编译并在CPU。

然后我想在GPU上运行此代码,因此创建了该内核:

Then I want to run this code on a GPU and thus I create this kernel:

__global__ void deviceDiffKernel(int *in_1, int *in_2 , int *out, int N) {

    int idx = blockIdx.x*blockDim.x + threadIdx.x + 1;
    int idy = blockIdx.y*blockDim.y + threadIdx.y + 1;

    out[idy*N + idx] = fabs(in_1[idy*N + idx] - in_2[idy*N + idx]);

}

在这里我不能使用std :: fabs函数(编译器返回错误):

here I cannot use the std::fabs function (comiler returns error):

错误:不允许从__global__函数( deviceDeltaKernel)调用__host__函数( std :: fabs)

错误:设备代码中未定义标识符 std :: fabs

上面链接上的文档说要使用此功能:

The documentation on the link above says to use this funtion:

__device__ double fabs(double x);

当然我不能从内核这样调用它:

of course I cannot call it from the kernel like this:

out[idy*N + idx] = __device__ fabs(in_1[idy*N + idx] - in_2[idy*N + idx]);

或类似这样:

double out[idy*N + idx] = in_1[idy*N + idx] - in_2[idy*N + idx];
__device__ fabs(out[idy*N + idx]);

有人可以指出我该如何使用它吗?

can somebody indicate how I can I use it then?

*这很笼统,并且与上面CUDA Math链接中的所有功能相同。

*This is quite general and stands the same for all the functions in the CUDA Math link above.

推荐答案

如果将参数强制转换为

The kernel will compile if you cast the argument to the type indicated in the CUDA math API documentation:

#include <math.h>
__global__ void deviceDiffKernel(int *in_1, int *in_2 , int *out, int N) {

    int idx = blockIdx.x*blockDim.x + threadIdx.x + 1;
    int idy = blockIdx.y*blockDim.y + threadIdx.y + 1;

    out[idy*N + idx] = fabs((double)(in_1[idy*N + idx] - in_2[idy*N + idx]));

}

您的参数是整数类型。编译器会寻找最接近的匹配函数原型。由于CUDA数学API不提供 __ device__ double fabs(int); ,因此编译器从 std 中选择匹配的原型。

Your argument is an integer type. The compiler looks for the closest matching function prototype. Since the CUDA math API does not provide __device__ double fabs(int);, the compiler chooses a matching prototype from std and that isn't usable in device code.

作为这些类型问题的一般规则,无论您使用的是CUDA Math API的哪个函数,都应从以下开始:确保所有类型(参数,返回值)都与Math API文档中为函数原型提供的类型匹配。还要注意,对于 float double 类型的浮点运算,数学API经常具有不同的功能。一些数学API函数甚至可以支持参数类型,但仍需要获得匹配以使编译器识别要使用的正确函数。

As a general rule for these types of questions, regardless of which function you are using from the CUDA Math API, start by making sure all types (arguments, return value) match the types given for the function prototype in the math API documentation. Also note that the math API often has different functions available for floating-point operations on float vs. double type. Some math API function may even support a mixture of argument types, but its still necessary to get a "match" to get the compiler to identify the correct function to use.

关于其他用法,删除 __ device __ 装饰器。

Regarding your other usages, drop the __device__ decorator. That is not used when calling the function.

这篇关于我无法理解CUDA文档以便在CUDA内核中使用math.h函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆