将内在函数作为模板参数传递 [英] passing intrinsic function as template parameter

查看:101
本文介绍了将内在函数作为模板参数传递的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将atomicAdd函数作为模板参数传递给另一个函数。

I'm trying to passing atomicAdd function into another function as template parameter.

这是我的内核1:

template<typename T, typename TAtomic>
__global__ void myfunc1(T *address, TAtomic atomicFunc) {
    atomicFunc(address, 1);
}

尝试1:

myfunc1<<<1,1>>>(val.dev_ptr, atomicAdd);

它不起作用,因为编译器无法匹配预期的函数签名。

It does not work due to the compiler cannot match the expected function signature.

尝试2:
首先,我将atomicAdd包装到一个名为MyAtomicAdd的自定义函数中。

Try 2: Firstly, I wrap the atomicAdd into a custom function called MyAtomicAdd.

template<typename T>
__device__ void MyAtomicAdd(T *address, T val) {
    atomicAdd(address, val);
}

然后,我定义了一个名为 TAtomic的函数指针,并将TAtomic声明为

Then, I defined a function pointer called "TAtomic" and declare the TAtomic as template parameter.

typedef void (*TAtomic)(float *,float);

template<typename T, TAtomic atomicFunc>
__global__ void myfunc2(T *address) {
    atomicFunc(address, 1);
}

myfunc2<float, MyAtomicAdd><<<1,1>>>(dev_ptr);
CUDA_CHECK(cudaDeviceSynchronize());

实际上,尝试2种方法。但是,我不想使用typedef。我需要更通用的东西。

Actually, Try 2 works. But, I don't want to use typedef. I need something more generic.

尝试3:
只需将MyAtomicAdd传递给myfunc1。

Try 3: Just passing MyAtomicAdd to myfunc1.

myfunc1<<<1,1>>>(dev_ptr, MyAtomicAdd<float>);
CUDA_CHECK(cudaDeviceSynchronize());

编译器可以编译代码。但是当我运行程序时,报告了错误:

"ERROR in /home/liang/groute-dev/samples/framework/pagerank.cu:70: invalid program counter (76)"

我只是想知道,为什么尝试3无效?是否存在任何简单或温和的方式来实现此要求?谢谢。

I just wondering, why try 3 doesn't work? And any simple or gentle way exists to implement this requirement? Thank you.

推荐答案

尝试3无效,因为您尝试获取的地址主机代码中的__device __ 函数,在CUDA中是非法的:

Try 3 doesn't work because you are attempting to take the address of a __device__ function in host code, which is illegal in CUDA:

myfunc1<<<1,1>>>(dev_ptr, MyAtomicAdd<float>);
                          ^
                          effectively a function pointer - address of a __device__ function

这种用法CUDA中的尝试将解析为某种地址-但这是垃圾,因此当您尝试将其用作设备代码中的实际功能入口点时,会遇到以下错误: invalid程序计数器(或者在某些情况下,只是非法地址)。

Such usage attempts in CUDA will resolve to some sort of an "address" - but it is garbage, so when you try to use it as an actual function entry point in device code, you get the error you encountered: invalid program counter (or in some cases, just illegal address).

您可以通过将内在函数包装在函子中,而不要使用裸露的 __ device __ 函数来使Try 3方法工作(无需 typedef ) :

You can make your Try 3 method work (without a typedef) by wrapping the intrinsic in a functor instead of a bare __device__ function:

$ cat t48.cu
#include <stdio.h>

template<typename T>
__device__ void MyAtomicAdd(T *address, T val) {
    atomicAdd(address, val);
}


template <typename T>
struct myatomicadd
{
  __device__ T operator()(T *addr, T val){
    return atomicAdd(addr, val);
  }
};

template<typename T, typename TAtomic>
__global__ void myfunc1(T *address, TAtomic atomicFunc) {
    atomicFunc(address, (T)1);
}


int main(){

  int *dev_ptr;
  cudaMalloc(&dev_ptr, sizeof(int));
  cudaMemset(dev_ptr, 0, sizeof(int));
//  myfunc1<<<1,1>>>(dev_ptr, MyAtomicAdd<int>);
  myfunc1<<<1,1>>>(dev_ptr, myatomicadd<int>());
  int h = 0;
  cudaMemcpy(&h, dev_ptr, sizeof(int), cudaMemcpyDeviceToHost);
  printf("h = %d\n", h);
  return 0;
}
$ nvcc -arch=sm_35 -o t48 t48.cu
$ cuda-memcheck ./t48
========= CUDA-MEMCHECK
h = 1
========= ERROR SUMMARY: 0 errors
$

我们也可以实现一个稍微简单的版本,让我们可以从内核模板类型推断出函子模板类型:

We can realize a slightly simpler version of this as well, letting the functor template type be inferred from the kernel template type:

$ cat t48.cu
#include <stdio.h>

struct myatomicadd
{
template <typename T>
  __device__ T operator()(T *addr, T val){
    return atomicAdd(addr, val);
  }
};

template<typename T, typename TAtomic>
__global__ void myfunc1(T *address, TAtomic atomicFunc) {
    atomicFunc(address, (T)1);
}


int main(){

  int *dev_ptr;
  cudaMalloc(&dev_ptr, sizeof(int));
  cudaMemset(dev_ptr, 0, sizeof(int));
  myfunc1<<<1,1>>>(dev_ptr, myatomicadd());
  int h = 0;
  cudaMemcpy(&h, dev_ptr, sizeof(int), cudaMemcpyDeviceToHost);
  printf("h = %d\n", h);
  float *dev_ptrf;
  cudaMalloc(&dev_ptrf, sizeof(float));
  cudaMemset(dev_ptrf, 0, sizeof(float));
  myfunc1<<<1,1>>>(dev_ptrf, myatomicadd());
  float hf = 0;
  cudaMemcpy(&hf, dev_ptrf, sizeof(float), cudaMemcpyDeviceToHost);
  printf("hf = %f\n", hf);
  return 0;
}
$ nvcc -arch=sm_35 -o t48 t48.cu
$ cuda-memcheck ./t48
========= CUDA-MEMCHECK
h = 1
hf = 1.000000
========= ERROR SUMMARY: 0 errors
$

在CUDA中使用设备功能指针的更多方法链接到此答案

More treatments of the use of device function pointers in CUDA are linked to this answer.

这篇关于将内在函数作为模板参数传递的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆