CUDA动态并行生成文件 [英] CUDA Dynamic Parallelism MakeFile

查看：130 发布时间：2016/8/19 14:10:40 c cuda

本文介绍了CUDA动态并行生成文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这是使用动态并行我的第一个程序，我无法编译code。我需要能够在大学我的研究项目运行这个和任何帮助将是最AP preciated：

我收到以下错误：

  /cm/shared/apps/cuda50/toolkit/5.0.35/bin/nvcc -m64 -dc -gen code ARCH = compute_35，code = sm_35 -rdc =真-dlink -po maxrregcount = 16 -I /平方厘米/共享/应用/ cuda50 /工具/ 5.0.35 -I。 -I .. -I ../../普通/ INC -o BlackScholes.o -c BlackScholes.cu
G ++ -m64 -I /平方厘米/共享/应用/ cuda50 /工具/ 5.0.35 -I。 -I .. -I ../../普通/ INC -o BlackScholes_gold.o -c BlackScholes_gold.cpp
G ++ -m64 -o BlackScholes BlackScholes.o BlackScholes_gold.o -L /厘米/共享/应用/ cuda50 /工具/ 5.0.35 / lib64下-lcudart -lcudadevrt
BlackScholes.o：在功能`__sti ____ cudaRegisterAll_47_tmpxft_000059cb_00000000_6_BlackScholes_cpp1_ii_c58990ec（）：
。tmpxft_000059cb_00000000-3_BlackScholes.cudafe1.cpp :(文字+ 0x1354）：未定义的参考`__cudaRegisterLinkedBinary_47_tmpxft_000059cb_00000000_6_BlackScholes_cpp1_ii_c58990ec
collect2：劳工处返回1退出状态
使：*** [BlackScholes]错误1

我有一个CPP文件，一是铜文件和一个CUH文件。我的makefile重要部分是如下：

  CUDA＃code一代标志
＃GEN code_SM10：= -gen code ARCH = compute_10，code = sm_10
GEN code_SM20：= -gen code ARCH = compute_20，code = sm_20
GEN code_SM30：= -gen code ARCH = compute_30，code = sm_30 -gen code ARCH = compute_35，code = sm_35
GEN code_SM35：= -gen code ARCH = compute_35，code = sm_35
＃GEN code_FLAGS：= $（GEN code_SM10）$（GEN code_SM20）$（GEN code_SM30）
GEN code_FLAGS：= $（GEN code_SM35）＃OS特定的构建标志
ifneq（$（DARWIN））
      LDFLAGS：= -Xlinker -rpath $（CUDA_LIB_PATH）-L $（CUDA_LIB_PATH）-lcudart -lcudadevrt
      CCFLAGS：= $ -arch（OS_ARCH）
其他
  IFEQ（$（OS_SIZE），32）
      LDFLAGS：= -L $（CUDA_LIB_PATH）-lcudart -lcudadevrt
      CCFLAGS：= -m32
  其他
      LDFLAGS：= -L $（CUDA_LIB_PATH）-lcudart -lcudadevrt
      CCFLAGS：= -m64
  万一
万一＃OS架构特定的标志
IFEQ（$（OS_SIZE），32）
      NVCCFLAGS：= -m32 -dc
其他
      NVCCFLAGS：= -m64 -dc
万一＃调试版本标志
IFEQ（$（DBG），1）
      CCFLAGS + = -g
      NVCCFLAGS + = -g -G
      TARGET：=调试
其他
      TARGET：=释放
万一
＃常见的包括和CUDA路径
包含：= -I $（CUDA_INC_PATH）-1。 -I .. -I ../../普通/ INC＃附加参数
MAXRREGCOUNT：= -po maxrregcount = 16＃目标规则
所有：编译打造：BlackScholesBlackScholes.o：BlackScholes.cu
        $（NVCC）$（NVCCFLAGS）$（EXTRA_NVCCFLAGS）$（GEN code_FLAGS）-rdc =真-dlink $（MAXRREGCOUNT）$（含）-o $ @ -c $＆LT;BlackScholes_gold.o：BlackScholes_gold.cpp
        $（GCC）$（CCFLAGS）$（含）-o $ @ -c $＆LT;BlackScholes：BlackScholes.o BlackScholes_gold.o
        $（GCC）$（CCFLAGS）-o $ @ $ + $（LDFLAGS）$（EXTRA_LDFLAGS）
        MKDIR -p ../../bin/$(OSLOWER)/$(TARGET）
        CP $ @ ../../bin/$(OSLOWER)/$(TARGET）
    在此输入code运行：打造
        ./BlackScholes

解决方案

在使用主机连接（ G ++ ）为您的可执行文件的最后链接，并在使用重定位装置code（ NVCC -dc ），有必要做一个中间设备code链接步骤。

从文档：

 如果要单独调用设备和主机连接，你可以这样做：NVCC -arch = sm_20 -dc a.cu b.cu
NVCC -arch = sm_20 -dlink a.o B.O -o link.o
G ++ a.o B.O link.o -L＆LT;路径＆GT; -lcudart

由于您指定 -dc 在编译行，你得到一个唯一的编译操作（就像您指定了 -c 至g ++）。

下面是一个修改/冷凝的Makefile 应该告诉你什么是参与：

  GEN code_SM35：= -gen code ARCH = compute_35，code = sm_35
GEN code_FLAGS：= $（GEN code_SM35）LDFLAGS：= -L在/ usr /本地/ CUDA / lib64下-lcudart -lcudadevrt
CCFLAGS：= -m64NVCCFLAGS：= -m64 -dcNVCC：= NVCC
GCC：= G ++＃调试版本标志
IFEQ（$（DBG），1）
      CCFLAGS + = -g
      NVCCFLAGS + = -g -G
      TARGET：=调试
其他
      TARGET：=释放
万一
＃常见的包括和CUDA路径
包含：= -I在/ usr /本地/ CUDA /包括-I。 -一世..＃附加参数
MAXRREGCOUNT：= -po maxrregcount = 16＃目标规则
所有：编译打造：BlackScholesBlackScholes.o：BlackScholes.cu
        $（NVCC）$（NVCCFLAGS）$（EXTRA_NVCCFLAGS）$（GEN code_FLAGS）$（MAXRREGCOUNT）$（含）-o $ @ $＆LT;
        $（NVCC）-dlink $（GEN code_FLAGS）$（MAXRREGCOUNT）-o bs_link.o $ @BlackScholes_gold.o：BlackScholes_gold.cpp
        $（GCC）$（CCFLAGS）$（含）-o $ @ -c $＆LT;BlackScholes：BlackScholes.o BlackScholes_gold.o bs_link.o
        $（GCC）$（CCFLAGS）-o $ @ $ + $（LDFLAGS）$（EXTRA_LDFLAGS）运行：打造
        ./BlackScholes

This is my first program using Dynamic Parallelism and I am unable to compile the code. I need to be able to run this for my research project at college and any help will be most appreciated:

I get the following error:

/cm/shared/apps/cuda50/toolkit/5.0.35/bin/nvcc -m64 -dc  -gencode arch=compute_35,code=sm_35 -rdc=true -dlink -po maxrregcount=16 -I/cm/shared/apps/cuda50/toolkit/5.0.35 -I. -I.. -I../../common/inc -o BlackScholes.o -c BlackScholes.cu
g++ -m64 -I/cm/shared/apps/cuda50/toolkit/5.0.35 -I. -I.. -I../../common/inc -o BlackScholes_gold.o -c BlackScholes_gold.cpp
g++ -m64 -o BlackScholes BlackScholes.o BlackScholes_gold.o -L/cm/shared/apps/cuda50/toolkit/5.0.35/lib64 -lcudart -lcudadevrt
BlackScholes.o: In function `__sti____cudaRegisterAll_47_tmpxft_000059cb_00000000_6_BlackScholes_cpp1_ii_c58990ec()':
tmpxft_000059cb_00000000-3_BlackScholes.cudafe1.cpp:(.text+0x1354): undefined reference to `__cudaRegisterLinkedBinary_47_tmpxft_000059cb_00000000_6_BlackScholes_cpp1_ii_c58990ec'
collect2: ld returned 1 exit status
make: *** [BlackScholes] Error 1

I have one cpp file, one cu file and one cuh file. Important portions of my makefile are below:

# CUDA code generation flags
#GENCODE_SM10    := -gencode arch=compute_10,code=sm_10
GENCODE_SM20     := -gencode arch=compute_20,code=sm_20
GENCODE_SM30     := -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35
GENCODE_SM35     := -gencode arch=compute_35,code=sm_35
#GENCODE_FLAGS    := $(GENCODE_SM10) $(GENCODE_SM20) $(GENCODE_SM30)
GENCODE_FLAGS    := $(GENCODE_SM35)

# OS-specific build flags
ifneq ($(DARWIN),)
      LDFLAGS   := -Xlinker -rpath $(CUDA_LIB_PATH) -L$(CUDA_LIB_PATH) -lcudart -lcudadevrt
      CCFLAGS   := -arch $(OS_ARCH)
else
  ifeq ($(OS_SIZE),32)
      LDFLAGS   := -L$(CUDA_LIB_PATH) -lcudart -lcudadevrt
      CCFLAGS   := -m32
  else
      LDFLAGS   := -L$(CUDA_LIB_PATH) -lcudart -lcudadevrt
      CCFLAGS   := -m64
  endif
endif

# OS-architecture specific flags
ifeq ($(OS_SIZE),32)
      NVCCFLAGS := -m32 -dc
else
      NVCCFLAGS := -m64 -dc
endif

# Debug build flags
ifeq ($(dbg),1)
      CCFLAGS   += -g
      NVCCFLAGS += -g -G
      TARGET := debug
else
      TARGET := release
endif


# Common includes and paths for CUDA
INCLUDES      := -I$(CUDA_INC_PATH) -I. -I.. -I../../common/inc

# Additional parameters
MAXRREGCOUNT  :=  -po maxrregcount=16

# Target rules
all: build

build: BlackScholes

BlackScholes.o: BlackScholes.cu
        $(NVCC) $(NVCCFLAGS) $(EXTRA_NVCCFLAGS) $(GENCODE_FLAGS) -rdc=true -dlink $(MAXRREGCOUNT) $(INCLUDES) -o $@ -c $<

BlackScholes_gold.o: BlackScholes_gold.cpp
        $(GCC) $(CCFLAGS) $(INCLUDES) -o $@ -c $<

BlackScholes: BlackScholes.o BlackScholes_gold.o
        $(GCC) $(CCFLAGS) -o $@ $+ $(LDFLAGS) $(EXTRA_LDFLAGS)
        mkdir -p ../../bin/$(OSLOWER)/$(TARGET)
        cp $@ ../../bin/$(OSLOWER)/$(TARGET)
    enter code here

run: build
        ./BlackScholes

解决方案

When using the host linker (g++) for final linking of your executable, and when using relocatable device code (nvcc -dc), it's necessary to do an intermediate device code link step.

From the documentation:

If you want to invoke the device and host linker separately, you can do:

nvcc –arch=sm_20 –dc a.cu b.cu
nvcc –arch=sm_20 –dlink a.o b.o –o link.o
g++ a.o b.o link.o –L<path> -lcudart

Since you are specifying -dc on the compile line, you are getting a compile-only operation (just as if you had specified -c to g++).

Here's a modified/condensed Makefile that should show you what is involved:

GENCODE_SM35     := -gencode arch=compute_35,code=sm_35
GENCODE_FLAGS    := $(GENCODE_SM35)

LDFLAGS   := -L/usr/local/cuda/lib64 -lcudart -lcudadevrt
CCFLAGS   := -m64

NVCCFLAGS := -m64 -dc

NVCC := nvcc
GCC := g++

# Debug build flags
ifeq ($(dbg),1)
      CCFLAGS   += -g
      NVCCFLAGS += -g -G
      TARGET := debug
else
      TARGET := release
endif


# Common includes and paths for CUDA
INCLUDES      := -I/usr/local/cuda/include -I. -I..

# Additional parameters
MAXRREGCOUNT  :=  -po maxrregcount=16

# Target rules
all: build

build: BlackScholes

BlackScholes.o: BlackScholes.cu
        $(NVCC) $(NVCCFLAGS) $(EXTRA_NVCCFLAGS) $(GENCODE_FLAGS) $(MAXRREGCOUNT) $(INCLUDES) -o $@ $<
        $(NVCC) -dlink  $(GENCODE_FLAGS) $(MAXRREGCOUNT)  -o bs_link.o $@

BlackScholes_gold.o: BlackScholes_gold.cpp
        $(GCC) $(CCFLAGS) $(INCLUDES) -o $@ -c $<

BlackScholes: BlackScholes.o BlackScholes_gold.o bs_link.o
        $(GCC) $(CCFLAGS) -o $@ $+ $(LDFLAGS) $(EXTRA_LDFLAGS)

run: build
        ./BlackScholes

这篇关于CUDA动态并行生成文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

CUDA动态并行生成文件 [英] CUDA Dynamic Parallelism MakeFile

问题描述

相关文章

C/C++最新文章

热门教程

热门工具

登录关闭

CUDA动态并行生成文件 [英] CUDA Dynamic Parallelism MakeFile

问题描述

相关文章

C/C++最新文章

热门教程

热门工具

登录 关闭

登录关闭