模板__host__ __device__调用主机定义的函数 [英] Template __host__ __device__ calling host defined functions
问题描述
在实现CUDA代码期间,我经常需要一些实用程序函数,这些函数将从设备和主机代码调用。所以我声明这些函数为 __ host__ __device __ 。这是正常的,可能的设备/主机不兼容性可以通过 #ifdef CUDA_ARCH 来处理。
During implementation of CUDA code I often need some utility functions, which will be called from device and also from host code. So I declare these functions as __host__ __device__. This is OK and possible device/host incompabilities can be handled by #ifdef CUDA_ARCH.
问题出现在效用函数为模板时。由一些函子类型。如果模板实例调用 __ host __ 函数,我会收到此警告:
Problems come when the utility function is templated ie. by some functor type. If the template instance calls a __host__ function I get this warning:
calling a __host__ function from a __host__ __device__ function is not allowed
detected during instantiation of "int foo(const T &) [with T=HostObject]"
我知道的解决方案是定义函数两次 - 一次为设备和一次为主机代码不同的名称(我不能重载 __ host__ __device __
)。但是这意味着有代码重复和所有其他 __ host__ __device __
函数将调用它,也必须定义两次(甚至更多的代码重复)。
Only solution I know is to define the function twice - once for device and once for host code with different name (I cannot overload on __host__ __device__
). But this means that there is code duplication and all other __host__ __device__
functions which will call it, must be also defined twice (even more code duplication).
简化示例:
#include <cuda.h>
#include <iostream>
struct HostObject {
__host__
int value() const { return 42; }
};
struct DeviceObject {
__device__
int value() const { return 3; }
};
template <typename T>
__host__ __device__
int foo(const T &obj) {
return obj.value();
}
/*
template <typename T>
__host__
int foo_host(const T &obj) {
return obj.value();
}
template <typename T>
__device__
int foo_device(const T &obj) {
return obj.value();
}
*/
__global__ void kernel(int *data) {
data[threadIdx.x] = foo(DeviceObject());
}
int main() {
foo(HostObject());
int *data;
cudaMalloc((void**)&data, sizeof(int) * 64);
kernel<<<1, 64>>>(data);
cudaThreadSynchronize();
cudaFree(data);
}
警告是由 foo ); c> $($
)
Warning is caused by the foo(HostObject());
call inside the main()
function.
foo_host<>
和 foo_device<>
可能替换有问题的 foo<>
。
foo_host<>
and foo_device<>
are possible replacements for the problematic foo<>
.
有更好的解决方案吗?我可以阻止设备侧的 foo()
实例化吗?
Is there a better solution? Can I prevent instantion of foo()
on the device side?
推荐答案
您不能阻止实例化 __ host__ __device __
函数模板实例化的一半。如果你通过在主机(设备)上调用函数来实例化函数,编译器也会实例化设备(主机)一半。
You cannot prevent instantiation of either half of a __host__ __device__
function template instantiation. If you instantiate the function by calling it on the host (device), the compiler will also instantiate the device (host) half.
从CUDA 7.0起的情况是使用 #pragma hd_warning_disable
来抑制警告,如下面的示例所示,并确保该函数未被错误地调用。
The best you can do for your use case as of CUDA 7.0 is to suppress the warning using #pragma hd_warning_disable
as in the following example and ensure that the function is not called incorrectly.
#include <iostream>
#include <cstdio>
#pragma hd_warning_disable
template<class Function>
__host__ __device__
void invoke(Function f)
{
f();
}
struct host_only
{
__host__
void operator()()
{
std::cout << "host_only()" << std::endl;
}
};
struct device_only
{
__device__
void operator()()
{
printf("device_only(): thread %d\n", threadIdx.x);
}
};
__global__
void kernel()
{
// use from device with device functor
invoke(device_only());
// XXX error
// invoke(host_only());
}
int main()
{
// use from host with host functor
invoke(host_only());
kernel<<<1,1>>>();
cudaDeviceSynchronize();
// XXX error
// invoke(device_only());
return 0;
}
这篇关于模板__host__ __device__调用主机定义的函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!