模板 __host__ __device__ 调用主机定义的函数 [英] Template __host__ __device__ calling host defined functions
问题描述
在 CUDA 代码的实现过程中,我经常需要一些实用函数,这些函数可以从设备调用,也可以从主机代码调用.所以我将这些函数声明为 __host__ __device__.这没关系,#ifdef CUDA_ARCH 可以处理可能的设备/主机不兼容问题.
During implementation of CUDA code I often need some utility functions, which will be called from device and also from host code. So I declare these functions as __host__ __device__. This is OK and possible device/host incompabilities can be handled by #ifdef CUDA_ARCH.
当效用函数被模板化时,问题就来了.通过某种函子类型.如果模板实例调用 __host__ 函数,我会收到以下警告:
Problems come when the utility function is templated ie. by some functor type. If the template instance calls a __host__ function I get this warning:
calling a __host__ function from a __host__ __device__ function is not allowed
detected during instantiation of "int foo(const T &) [with T=HostObject]"
我知道的唯一解决方案是定义函数两次 - 一次用于设备,一次用于具有不同名称的主机代码(我不能在 __host__ __device__
上重载).但这意味着存在代码重复,并且所有其他将调用它的 __host__ __device__
函数也必须定义两次(甚至更多代码重复).
Only solution I know is to define the function twice - once for device and once for host code with different name (I cannot overload on __host__ __device__
). But this means that there is code duplication and all other __host__ __device__
functions which will call it, must be also defined twice (even more code duplication).
简化示例:
#include <cuda.h>
#include <iostream>
struct HostObject {
__host__
int value() const { return 42; }
};
struct DeviceObject {
__device__
int value() const { return 3; }
};
template <typename T>
__host__ __device__
int foo(const T &obj) {
return obj.value();
}
/*
template <typename T>
__host__
int foo_host(const T &obj) {
return obj.value();
}
template <typename T>
__device__
int foo_device(const T &obj) {
return obj.value();
}
*/
__global__ void kernel(int *data) {
data[threadIdx.x] = foo(DeviceObject());
}
int main() {
foo(HostObject());
int *data;
cudaMalloc((void**)&data, sizeof(int) * 64);
kernel<<<1, 64>>>(data);
cudaThreadSynchronize();
cudaFree(data);
}
警告是由 main()
函数内部的 foo(HostObject());
调用引起的.
Warning is caused by the foo(HostObject());
call inside the main()
function.
foo_host<>
和 foo_device<>
是有问题的 foo<>
的可能替代品.
foo_host<>
and foo_device<>
are possible replacements for the problematic foo<>
.
有没有更好的解决方案?我可以防止在设备端实例化 foo()
吗?
Is there a better solution? Can I prevent instantion of foo()
on the device side?
推荐答案
您无法阻止 __host__ __device__
函数模板实例化的任何一半的实例化.如果通过在主机(设备)上调用来实例化函数,编译器也会实例化设备(主机)的一半.
You cannot prevent instantiation of either half of a __host__ __device__
function template instantiation. If you instantiate the function by calling it on the host (device), the compiler will also instantiate the device (host) half.
从 CUDA 7.0 开始,您可以为您的用例做的最好的事情是使用 #pragma hd_warning_disable
来抑制警告,如下例所示,并确保不会错误地调用该函数.
The best you can do for your use case as of CUDA 7.0 is to suppress the warning using #pragma hd_warning_disable
as in the following example and ensure that the function is not called incorrectly.
#include <iostream>
#include <cstdio>
#pragma hd_warning_disable
template<class Function>
__host__ __device__
void invoke(Function f)
{
f();
}
struct host_only
{
__host__
void operator()()
{
std::cout << "host_only()" << std::endl;
}
};
struct device_only
{
__device__
void operator()()
{
printf("device_only(): thread %d
", threadIdx.x);
}
};
__global__
void kernel()
{
// use from device with device functor
invoke(device_only());
// XXX error
// invoke(host_only());
}
int main()
{
// use from host with host functor
invoke(host_only());
kernel<<<1,1>>>();
cudaDeviceSynchronize();
// XXX error
// invoke(device_only());
return 0;
}
这篇关于模板 __host__ __device__ 调用主机定义的函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!