global函数的“内联"以避免多重定义错误 [英] 'inline' for global functions to avoid multiple definition error

查看：312 发布时间：2020/5/1 3:36:20 c++ cuda linker

本文介绍了__global__函数的“内联"以避免多重定义错误的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个CUDA模板库，其中一个函数实际上不是模板，但是是定义在.cuh标头中. (下面kernel.cuh中的vector_add_kernel.)

I have a CUDA template library, in which one function is actually not a template, but is defined within a .cuh header. (vector_add_kernel in kernel.cuh below.)

如果多个.cu文件包含kernel.cuh并调用vector_add[_kernel]，则将在链接时导致多个定义错误.在C ++中，可以使用inline限定符来避免此类错误.

If multiple .cu files include kernel.cuh and call vector_add[_kernel], it will result in multiple definition errors at link-time. In C++, one can use the inline qualifier to avoid such errors.

但是，inline __global__ ...-在防止系统上出现多个定义错误的同时-导致警告inline限定符已被忽略.

However, inline __global__ ... - while preventing the multiple definition errors on my system - results in a warning that the inline qualifier has been ignored.

问:是否有更好的方法来避免多定义错误，或者仅针对此功能来抑制此警告?并且inline __global__甚至是安全的，还是其他主机编译器 truly 会忽略它吗?

Q: Is there a better way to avoid the multiple definition error, or a way to suppress this warning only for this function? And is inline __global__ even safe, or might other host compilers truly ignore it?

我可以简单地将vector_add_kernel移动到单独的.cu文件，但是它将是 only 的非头文件.我也可以模板vector_add_kernel，但是在我的库中没有什么意义.

I could simply move the vector_add_kernel to a separate .cu file, but it would be the only non-header file. I could also template vector_add_kernel, but in my library that makes little sense.

下面是一个工作示例(不是很简单，很抱歉)(在CUDA 7.0中测试，在Debian上使用gcc 4.7.2进行了测试).

A (not-so-minimal, sorry) working example (tested with CUDA 7.0, gcc 4.7.2 on Debian) is below.

为澄清起见，main.cu是某些用户的代码； lib.cu是一些不属于我的外部库； kernel.cuh是我的模板库的一部分.因此，外部lib和用户的main都在使用我的模板库kernel.cuh-但分别使用.

To clarify, main.cu is some user's code; lib.cu is some external library not belonging to me; and kernel.cuh is part of my template library. So, both the external lib and the user's main are using my template library, kernel.cuh - but separately.

main.cu:

#include "lib.hpp"
#include "kernel.cuh"

#include <thrust/device_vector.h>
#include <thrust/host_vector.h>

#include <cstddef>
#include <cstdlib>
#include <iostream>

int main(void)
{
    const size_t N = 1u << 7;

    float* a = (float*) malloc(N * sizeof(float));
    float* b = (float*) malloc(N * sizeof(float));
    float* c = (float*) malloc(N * sizeof(float));

    for (int i = 0; i < N; ++i) {
        a[i] = b[i] = 2.0f * i;
    }

    lib_vector_add(a, b, c, N);
    for (int i = 0; i < N; ++i) {
        if (c[i] != 2.0f * i + 2.0f * i)
            std::cout << "Error, lib, element " << i << std::endl;
    }

    thrust::device_vector<float> d_a(a, a + N);
    thrust::device_vector<float> d_b(b, b + N);
    thrust::device_vector<float> d_c(N);

    vector_add(d_a, d_b, d_c);
    thrust::host_vector<float> h_c = d_c;
    for (int i = 0; i < N; ++i) {
        if (h_c[i] != 2.0f * i + 2.0f * i)
            std::cout << "Error, element " << i << std::endl;
    }
}

lib.cu，

#include <kernel.cuh>

#include <thrust/host_vector.h>
#include <thrust/device_vector.h>

void lib_vector_add(float* a, float* b, float* c, size_t N)
{
    thrust::host_vector<float> h_a(a, a + N);
    thrust::host_vector<float> h_b(b, b + N);

    thrust::device_vector<float> d_a = h_a;
    thrust::device_vector<float> d_b = h_b;
    thrust::device_vector<float> d_c(N);

    vector_add(d_a, d_b, d_c);

    thrust::host_vector<float> h_c = d_c;
    for (int i = 0; i < N; ++i)
    {
        c[i] = h_c[i];
    }
}

lib.hpp，

#pragma once

#include <cstddef>

void lib_vector_add(float*, float*, float*, size_t);

kernel.cuh-这种形式会导致链接器错误.取消注释第一个inline以获得可用的代码.

kernel.cuh - this form results in a linker error. Uncomment the first inline to get a working code.

#pragma once

#include <thrust/device_vector.h>
#include <cstddef>

// inline keyword avoids multiple definition errors, but produces warnings.
// UNCOMMENT TO GET A WORKING EXECUTABLE.
// inline
__global__ void vector_add_kernel(
    const float *const a,
    const float *const b,
    float *const c,
    const size_t N)
{
    int tid = threadIdx.x + blockIdx.x * blockDim.x;

    while (tid < N)
    {
        c[tid] = a[tid] + b[tid];
        tid += blockDim.x * gridDim.x;
    }
}

// inline produces no warnings.
inline
void vector_add(
    const thrust::device_vector<float>& d_a,
    const thrust::device_vector<float>& d_b,
    thrust::device_vector<float>& d_c)
{
    const float *const a_ptr = thrust::raw_pointer_cast(d_a.data());
    const float *const b_ptr = thrust::raw_pointer_cast(d_b.data());
    float *const c_ptr = thrust::raw_pointer_cast(d_c.data());

    const size_t N = d_a.size();

    dim3 block(128);
    dim3 grid((N + 127) / 128);

    vector_add_kernel<<<grid, block>>>(a_ptr, b_ptr, c_ptr, N);
}

Makefile

OBJS = main.o lib.o
DEPS = kernel.cuh
CU_ARCH = -gencode arch=compute_20,code=sm_20

all: app

app: $(OBJS)
    nvcc $(CU_ARCH) $(OBJS) -o app

%.o: %.cu $(DEPS)
    nvcc $(CU_ARCH) -dc -I./ $< -o $@

clean:
    -rm *.o

global函数的“内联"以避免多重定义错误 [英] 'inline' for global functions to avoid multiple definition error

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

__global__函数的“内联"以避免多重定义错误 [英] &#39;inline&#39; for __global__ functions to avoid multiple definition error

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

global函数的“内联"以避免多重定义错误 [英] 'inline' for global functions to avoid multiple definition error

登录关闭