CUDA单独编译时的推力错误 [英] Thrust error with CUDA separate compilation
问题描述
当我尝试编译CUDA并启用可重定位设备代码(-rdc = true)时,我遇到错误。我使用Visual Studio 2013作为编译器与CUDA 7.5。下面是一个显示错误的小例子。为了说明,下面的代码运行正常,当-rdc = false,但是当设置为true时,错误出现。
I'm running into an error when I try to compile CUDA with relocatable device code enabled (-rdc = true). I'm using Visual Studio 2013 as compiler with CUDA 7.5. Below is a small example that shows the error. To clarify, the code below runs fine when -rdc = false, but when set to true, the error shows up.
错误简单地说:CUDA错误11 [ \\ cuda \detail\cub\device\dispatch/ device_radix_sort_dispatch.cuh,687]:无效的参数
The error simply says: CUDA error 11 [\cuda\detail\cub\device\dispatch/device_radix_sort_dispatch.cuh, 687]: invalid argument
然后我发现 this ,其中说:
When invoked with primitive data types, thrust::sort, thrust::sort_by_key,thrust::stable_sort, thrust::stable_sort_by_key may fail to link in some cases with nvcc -rdc=true.
是否有一些解决方法允许单独编译?
Is there some workaround to allow separate compilation?
main.cpp:
#include <stdio.h>
#include <vector>
#include "cuda_runtime.h"
#include "RadixSort.h"
typedef unsigned int uint;
typedef unsigned __int64 uint64;
int main()
{
RadixSort sorter;
uint n = 10;
std::vector<uint64> test(n);
for (uint i = 0; i < n; i++)
test[i] = i + 1;
uint64 * d_array;
uint64 size = n * sizeof(uint64);
cudaMalloc(&d_array, size);
cudaMemcpy(d_array, test.data(), size, cudaMemcpyHostToDevice);
try
{
sorter.Sort(d_array, n);
}
catch (const std::exception & ex)
{
printf("%s\n", ex.what());
}
}
RadixSort.h:
RadixSort.h:
#pragma once
typedef unsigned int uint;
typedef unsigned __int64 uint64;
class RadixSort
{
public:
RadixSort() {}
~RadixSort() {}
void Sort(uint64 * input, const uint n);
};
RadixSort.cu:
RadixSort.cu:
#include "RadixSort.h"
#include <thrust/device_vector.h>
#include <thrust/device_ptr.h>
#include <thrust/sort.h>
void RadixSort::Sort(uint64 * input, const uint n)
{
thrust::device_ptr<uint64> d_input = thrust::device_pointer_cast(input);
thrust::stable_sort(d_input, d_input + n);
cudaDeviceSynchronize();
}
推荐答案
Robert Crovella:
As mentioned in the comments by Robert Crovella:
将CUDA架构更改为更高的值会解决这个问题。在我的情况下,我把它改为compute_30和sm_30在CUDA C ++ - >设备 - >代码生成。
Changing the CUDA architecture to a higher value will solve this problem. In my case I changed it to compute_30 and sm_30 under CUDA C++ -> Device -> Code Generation.
编辑:
一般建议是为您的特定GPU选择最佳适合层次结构。有关其他信息,请参阅评论中的链接。
The general recommendation is to select the best fit hierarchy for your specific GPU. See the link in comments for additional information.
这篇关于CUDA单独编译时的推力错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!