__global__函数的“内联"以避免多重定义错误 [英] 'inline' for __global__ functions to avoid multiple definition error

查看:312
本文介绍了__global__函数的“内联"以避免多重定义错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个CUDA模板库,其中一个函数实际上不是 模板,但是定义在.cuh标头中. (下面kernel.cuh中的vector_add_kernel.)

I have a CUDA template library, in which one function is actually not a template, but is defined within a .cuh header. (vector_add_kernel in kernel.cuh below.)

如果多个.cu文件包含kernel.cuh并调用vector_add[_kernel],则将在链接时导致多个定义错误.在C ++中,可以使用inline限定符来避免此类错误.

If multiple .cu files include kernel.cuh and call vector_add[_kernel], it will result in multiple definition errors at link-time. In C++, one can use the inline qualifier to avoid such errors.

但是,inline __global__ ...-在防止系统上出现多个定义错误的同时-导致警告inline限定符已被忽略.

However, inline __global__ ... - while preventing the multiple definition errors on my system - results in a warning that the inline qualifier has been ignored.

问:是否有更好的方法来避免多定义错误,或者仅针对此功能来抑制此警告?并且inline __global__甚至是安全的,还是其他主机编译器 truly 会忽略它吗?

Q: Is there a better way to avoid the multiple definition error, or a way to suppress this warning only for this function? And is inline __global__ even safe, or might other host compilers truly ignore it?

我可以简单地将vector_add_kernel移动到单独的.cu文件,但是它将是 only 的非头文件.我也可以模板vector_add_kernel,但是在我的库中没有什么意义.

I could simply move the vector_add_kernel to a separate .cu file, but it would be the only non-header file. I could also template vector_add_kernel, but in my library that makes little sense.

下面是一个工作示例(不是很简单,很抱歉)(在CUDA 7.0中测试,在Debian上使用gcc 4.7.2进行了测试).

A (not-so-minimal, sorry) working example (tested with CUDA 7.0, gcc 4.7.2 on Debian) is below.

为澄清起见,main.cu是某些用户的代码; lib.cu是一些不属于我的外部库 kernel.cuh是我的模板库的一部分.因此,外部lib和用户的main都在使用我的模板库kernel.cuh-但分别使用.

To clarify, main.cu is some user's code; lib.cu is some external library not belonging to me; and kernel.cuh is part of my template library. So, both the external lib and the user's main are using my template library, kernel.cuh - but separately.

main.cu:

#include "lib.hpp"
#include "kernel.cuh"

#include <thrust/device_vector.h>
#include <thrust/host_vector.h>

#include <cstddef>
#include <cstdlib>
#include <iostream>

int main(void)
{
    const size_t N = 1u << 7;

    float* a = (float*) malloc(N * sizeof(float));
    float* b = (float*) malloc(N * sizeof(float));
    float* c = (float*) malloc(N * sizeof(float));

    for (int i = 0; i < N; ++i) {
        a[i] = b[i] = 2.0f * i;
    }

    lib_vector_add(a, b, c, N);
    for (int i = 0; i < N; ++i) {
        if (c[i] != 2.0f * i + 2.0f * i)
            std::cout << "Error, lib, element " << i << std::endl;
    }

    thrust::device_vector<float> d_a(a, a + N);
    thrust::device_vector<float> d_b(b, b + N);
    thrust::device_vector<float> d_c(N);

    vector_add(d_a, d_b, d_c);
    thrust::host_vector<float> h_c = d_c;
    for (int i = 0; i < N; ++i) {
        if (h_c[i] != 2.0f * i + 2.0f * i)
            std::cout << "Error, element " << i << std::endl;
    }
}

lib.cu

#include <kernel.cuh>

#include <thrust/host_vector.h>
#include <thrust/device_vector.h>

void lib_vector_add(float* a, float* b, float* c, size_t N)
{
    thrust::host_vector<float> h_a(a, a + N);
    thrust::host_vector<float> h_b(b, b + N);

    thrust::device_vector<float> d_a = h_a;
    thrust::device_vector<float> d_b = h_b;
    thrust::device_vector<float> d_c(N);

    vector_add(d_a, d_b, d_c);

    thrust::host_vector<float> h_c = d_c;
    for (int i = 0; i < N; ++i)
    {
        c[i] = h_c[i];
    }
}

lib.hpp

#pragma once

#include <cstddef>

void lib_vector_add(float*, float*, float*, size_t);

kernel.cuh-这种形式会导致链接器错误.取消注释第一个inline以获得可用的代码.

kernel.cuh - this form results in a linker error. Uncomment the first inline to get a working code.

#pragma once

#include <thrust/device_vector.h>
#include <cstddef>

// inline keyword avoids multiple definition errors, but produces warnings.
// UNCOMMENT TO GET A WORKING EXECUTABLE.
// inline
__global__ void vector_add_kernel(
    const float *const a,
    const float *const b,
    float *const c,
    const size_t N)
{
    int tid = threadIdx.x + blockIdx.x * blockDim.x;

    while (tid < N)
    {
        c[tid] = a[tid] + b[tid];
        tid += blockDim.x * gridDim.x;
    }
}

// inline produces no warnings.
inline
void vector_add(
    const thrust::device_vector<float>& d_a,
    const thrust::device_vector<float>& d_b,
    thrust::device_vector<float>& d_c)
{
    const float *const a_ptr = thrust::raw_pointer_cast(d_a.data());
    const float *const b_ptr = thrust::raw_pointer_cast(d_b.data());
    float *const c_ptr = thrust::raw_pointer_cast(d_c.data());

    const size_t N = d_a.size();

    dim3 block(128);
    dim3 grid((N + 127) / 128);

    vector_add_kernel<<<grid, block>>>(a_ptr, b_ptr, c_ptr, N);
}

Makefile

OBJS = main.o lib.o
DEPS = kernel.cuh
CU_ARCH = -gencode arch=compute_20,code=sm_20

all: app

app: $(OBJS)
    nvcc $(CU_ARCH) $(OBJS) -o app

%.o: %.cu $(DEPS)
    nvcc $(CU_ARCH) -dc -I./ $< -o $@

clean:
    -rm *.o

推荐答案

如果要保持当前的代码组织,则有一个非常简单的解决方案,即声明内核static(代替您的inline关键词).这样可以防止链接程序发出抱怨,但是会生成与包含kernel.cuh的编译单元(目标文件)一样多的内核版本.

If you want to keep your current code organisation, you have a very simple solution which is to declare your kernel static (in place of your inline keyword). This will prevent the linker from complaining, but will however generate as many different versions of the kernel as there will be of compilation units (object files) where the kernel.cuh will have been included.

另一种解决方案是模板化您的内核.我知道您已经排除了这种可能性,但是您应该重新考虑它,因为您的内核是float输入参数类型的自然模板...

Another solution would be to templatise your kernel. I know you already dismissed this possibility, but you should reconsider it, since your kernel is a natural template for the float type of the input parameters...

这篇关于__global__函数的“内联"以避免多重定义错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆