模板 __host__ __device__ 调用主机定义的函数 [英] Template __host__ __device__ calling host defined functions

查看:28
本文介绍了模板 __host__ __device__ 调用主机定义的函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 CUDA 代码的实现过程中,我经常需要一些实用函数,这些函数可以从设备调用,也可以从主机代码调用.所以我将这些函数声明为 __host__ __device__.这没关系,#ifdef CUDA_ARCH 可以处理可能的设备/主机不兼容问题.

During implementation of CUDA code I often need some utility functions, which will be called from device and also from host code. So I declare these functions as __host__ __device__. This is OK and possible device/host incompabilities can be handled by #ifdef CUDA_ARCH.

当效用函数被模板化时,问题就来了.通过某种函子类型.如果模板实例调用 __host__ 函数,我会收到以下警告:

Problems come when the utility function is templated ie. by some functor type. If the template instance calls a __host__ function I get this warning:

calling a __host__ function from a __host__ __device__ function is not allowed
      detected during instantiation of "int foo(const T &) [with T=HostObject]" 

我知道的唯一解决方案是定义函数两次 - 一次用于设备,一次用于具有不同名称的主机代码(我不能在 __host__ __device__ 上重载).但这意味着存在代码重复,并且所有其他将调用它的 __host__ __device__ 函数也必须定义两次(甚至更多代码重复).

Only solution I know is to define the function twice - once for device and once for host code with different name (I cannot overload on __host__ __device__). But this means that there is code duplication and all other __host__ __device__ functions which will call it, must be also defined twice (even more code duplication).

简化示例:

#include <cuda.h>
#include <iostream>

struct HostObject {
    __host__ 
    int value() const { return 42; }
};

struct DeviceObject {
    __device__ 
    int value() const { return 3; }
};

template <typename T> 
__host__ __device__ 
int foo(const T &obj) {
    return obj.value();
}

/*
template <typename T> 
__host__ 
int foo_host(const T &obj) {
    return obj.value();
}

template <typename T> 
__device__ 
int foo_device(const T &obj) {
    return obj.value();
}
*/

__global__ void kernel(int *data) {
    data[threadIdx.x] = foo(DeviceObject());
}

int main() {
    foo(HostObject());

    int *data;
    cudaMalloc((void**)&data, sizeof(int) * 64);
    kernel<<<1, 64>>>(data);
    cudaThreadSynchronize();
    cudaFree(data);
}

警告是由 main() 函数内部的 foo(HostObject()); 调用引起的.

Warning is caused by the foo(HostObject()); call inside the main() function.

foo_host<>foo_device<> 是有问题的 foo<> 的可能替代品.

foo_host<> and foo_device<> are possible replacements for the problematic foo<>.

有没有更好的解决方案?我可以防止在设备端实例化 foo() 吗?

Is there a better solution? Can I prevent instantion of foo() on the device side?

推荐答案

您无法阻止 __host__ __device__ 函数模板实例化的任何一半的实例化.如果通过在主机(设备)上调用来实例化函数,编译器也会实例化设备(主机)的一半.

You cannot prevent instantiation of either half of a __host__ __device__ function template instantiation. If you instantiate the function by calling it on the host (device), the compiler will also instantiate the device (host) half.

从 CUDA 7.0 开始,您可以为您的用例做的最好的事情是使用 #pragma hd_warning_disable 来抑制警告,如下例所示,并确保不会错误地调用该函数.

The best you can do for your use case as of CUDA 7.0 is to suppress the warning using #pragma hd_warning_disable as in the following example and ensure that the function is not called incorrectly.

#include <iostream>
#include <cstdio>

#pragma hd_warning_disable
template<class Function>
__host__ __device__
void invoke(Function f)
{
  f();
}

struct host_only
{
  __host__
  void operator()()
  {
    std::cout << "host_only()" << std::endl;
  }
};

struct device_only
{
  __device__
  void operator()()
  {
    printf("device_only(): thread %d
", threadIdx.x);
  }
};

__global__
void kernel()
{
  // use from device with device functor
  invoke(device_only());

  // XXX error
  // invoke(host_only());
}

int main()
{
  // use from host with host functor
  invoke(host_only());

  kernel<<<1,1>>>();
  cudaDeviceSynchronize();

  // XXX error
  // invoke(device_only());

  return 0;
}

这篇关于模板 __host__ __device__ 调用主机定义的函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆