编译CUDA与动态并行性回退 - 多架构/计算能力 [英] Compiling CUDA with dynamic parallelism fallback - multiple architectures/compute capability

查看：218 发布时间：2017/3/4 15:50:13 gcc cuda nvcc

本文介绍了编译CUDA与动态并行性回退 - 多架构/计算能力的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在一个应用程序中，我有一堆CUDA内核。有些使用动态并行性，有些不使用。为了在不支持此功能的情况下提供后备选项，或只需允许应用程序继续，但可以使用减少/部分可用的功能，我该如何进行编译？

In one application, I've got a bunch of CUDA kernels. Some use dynamic parallelism and some don't. For the purposes of either providing a fallback option if this is not supported, or simply allowing the application to continue but with reduced/partially available features, how can I go about compiling?

此时，当使用 -arch = sm_35 无效的设备函数 c $ c>在670（最大 sm_30 ），不需要计算3.5。

At the moment I'm getting invalid device function when running kernels compiled with -arch=sm_35 on a 670 (max sm_30) that don't require compute 3.5.

AFAIK不使用多个 -arch = sm _ * 参数并使用多个 -gencode = * 对于可分离的编译我不得不创建一个额外的目标文件使用 -dlink ，但这不会在使用compute 3.0时创建（ nvlink fatal：没有候选人发现在fatbinary 由于 -lcudadevrt ，我需要为3.5），我应该如何处理？

AFAIK you can't use multiple -arch=sm_* arguments and using multiple -gencode=* doesn't help. Also for separable compilation I've had to create an additional object file using -dlink, but this doesn't get created when using compute 3.0 (nvlink fatal : no candidate found in fatbinary due to -lcudadevrt, which I've needed for 3.5), how should I deal with this?

推荐答案

我相信这个问题已经在CUDA 6中解决了。

I believe this issue has been addressed now in CUDA 6.

简单测试：

$ cat t264.cu
#include <stdio.h>

__global__ void kernel1(){
  printf("Hello from DP Kernel\n");
}

__global__ void kernel2(){

#if __CUDA_ARCH__ >= 350
  kernel1<<<1,1>>>();
#else
  printf("Hello from non-DP Kernel\n");
#endif
}

int main(){

  kernel2<<<1,1>>>();
  cudaDeviceSynchronize();
  return 0;
}
$ nvcc -O3 -gencode arch=compute_20,code=sm_20 -gencode arch=compute_35,code=sm_35 -rdc=true -o t264 t264.cu -lcudadevrt
$ CUDA_VISIBLE_DEVICES="0" ./t264
Hello from non-DP Kernel
$ CUDA_VISIBLE_DEVICES="1" ./t264
Hello from DP Kernel
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2013 NVIDIA Corporation
Built on Sat_Jan_25_17:33:19_PST_2014
Cuda compilation tools, release 6.0, V6.0.1
$

在我的例子中，设备0是Quadro5000，cc 2.0设备，设备1是一个GeForce GT 640，一个cc 3.5设备。

In my case, device 0 is a Quadro5000, a cc 2.0 device, and device 1 is a GeForce GT 640, a cc 3.5 device.

这篇关于编译CUDA与动态并行性回退 - 多架构/计算能力的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

编译CUDA与动态并行性回退 - 多架构/计算能力 [英] Compiling CUDA with dynamic parallelism fallback - multiple architectures/compute capability

问题描述

推荐答案

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录关闭

编译CUDA与动态并行性回退 - 多架构/计算能力 [英] Compiling CUDA with dynamic parallelism fallback - multiple architectures/compute capability

问题描述

推荐答案

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录 关闭

登录关闭