CUDA动态并行生成文件 [英] CUDA Dynamic Parallelism MakeFile
问题描述
这是使用动态并行我的第一个程序,我无法编译code。我需要能够在大学我的研究项目运行这个和任何帮助将是最AP preciated:
我收到以下错误:
/cm/shared/apps/cuda50/toolkit/5.0.35/bin/nvcc -m64 -dc -gen code ARCH = compute_35,code = sm_35 -rdc =真-dlink -po maxrregcount = 16 -I /平方厘米/共享/应用/ cuda50 /工具/ 5.0.35 -I。 -I .. -I ../../普通/ INC -o BlackScholes.o -c BlackScholes.cu
G ++ -m64 -I /平方厘米/共享/应用/ cuda50 /工具/ 5.0.35 -I。 -I .. -I ../../普通/ INC -o BlackScholes_gold.o -c BlackScholes_gold.cpp
G ++ -m64 -o BlackScholes BlackScholes.o BlackScholes_gold.o -L /厘米/共享/应用/ cuda50 /工具/ 5.0.35 / lib64下-lcudart -lcudadevrt
BlackScholes.o:在功能`__sti ____ cudaRegisterAll_47_tmpxft_000059cb_00000000_6_BlackScholes_cpp1_ii_c58990ec():
。tmpxft_000059cb_00000000-3_BlackScholes.cudafe1.cpp :(文字+ 0x1354):未定义的参考`__cudaRegisterLinkedBinary_47_tmpxft_000059cb_00000000_6_BlackScholes_cpp1_ii_c58990ec
collect2:劳工处返回1退出状态
使:*** [BlackScholes]错误1
我有一个CPP文件,一是铜文件和一个CUH文件。我的makefile重要部分是如下:
CUDA#code一代标志
#GEN code_SM10:= -gen code ARCH = compute_10,code = sm_10
GEN code_SM20:= -gen code ARCH = compute_20,code = sm_20
GEN code_SM30:= -gen code ARCH = compute_30,code = sm_30 -gen code ARCH = compute_35,code = sm_35
GEN code_SM35:= -gen code ARCH = compute_35,code = sm_35
#GEN code_FLAGS:= $(GEN code_SM10)$(GEN code_SM20)$(GEN code_SM30)
GEN code_FLAGS:= $(GEN code_SM35)#OS特定的构建标志
ifneq($(DARWIN))
LDFLAGS:= -Xlinker -rpath $(CUDA_LIB_PATH)-L $(CUDA_LIB_PATH)-lcudart -lcudadevrt
CCFLAGS:= $ -arch(OS_ARCH)
其他
IFEQ($(OS_SIZE),32)
LDFLAGS:= -L $(CUDA_LIB_PATH)-lcudart -lcudadevrt
CCFLAGS:= -m32
其他
LDFLAGS:= -L $(CUDA_LIB_PATH)-lcudart -lcudadevrt
CCFLAGS:= -m64
万一
万一#OS架构特定的标志
IFEQ($(OS_SIZE),32)
NVCCFLAGS:= -m32 -dc
其他
NVCCFLAGS:= -m64 -dc
万一#调试版本标志
IFEQ($(DBG),1)
CCFLAGS + = -g
NVCCFLAGS + = -g -G
TARGET:=调试
其他
TARGET:=释放
万一
#常见的包括和CUDA路径
包含:= -I $(CUDA_INC_PATH)-1。 -I .. -I ../../普通/ INC#附加参数
MAXRREGCOUNT:= -po maxrregcount = 16#目标规则
所有:编译打造:BlackScholesBlackScholes.o:BlackScholes.cu
$(NVCC)$(NVCCFLAGS)$(EXTRA_NVCCFLAGS)$(GEN code_FLAGS)-rdc =真-dlink $(MAXRREGCOUNT)$(含)-o $ @ -c $<BlackScholes_gold.o:BlackScholes_gold.cpp
$(GCC)$(CCFLAGS)$(含)-o $ @ -c $<BlackScholes:BlackScholes.o BlackScholes_gold.o
$(GCC)$(CCFLAGS)-o $ @ $ + $(LDFLAGS)$(EXTRA_LDFLAGS)
MKDIR -p ../../bin/$(OSLOWER)/$(TARGET)
CP $ @ ../../bin/$(OSLOWER)/$(TARGET)
在此输入code运行:打造
./BlackScholes
在使用主机连接( G ++
)为您的可执行文件的最后链接,并在使用重定位装置code( NVCC -dc
),有必要做一个中间设备code链接步骤。
从文档:
如果要单独调用设备和主机连接,你可以这样做:NVCC -arch = sm_20 -dc a.cu b.cu
NVCC -arch = sm_20 -dlink a.o B.O -o link.o
G ++ a.o B.O link.o -L<路径> -lcudart
由于您指定 -dc
在编译行,你得到一个唯一的编译操作(就像您指定了 -c
至g ++)。
下面是一个修改/冷凝的Makefile
应该告诉你什么是参与:
GEN code_SM35:= -gen code ARCH = compute_35,code = sm_35
GEN code_FLAGS:= $(GEN code_SM35)LDFLAGS:= -L在/ usr /本地/ CUDA / lib64下-lcudart -lcudadevrt
CCFLAGS:= -m64NVCCFLAGS:= -m64 -dcNVCC:= NVCC
GCC:= G ++#调试版本标志
IFEQ($(DBG),1)
CCFLAGS + = -g
NVCCFLAGS + = -g -G
TARGET:=调试
其他
TARGET:=释放
万一
#常见的包括和CUDA路径
包含:= -I在/ usr /本地/ CUDA /包括-I。 -一世..#附加参数
MAXRREGCOUNT:= -po maxrregcount = 16#目标规则
所有:编译打造:BlackScholesBlackScholes.o:BlackScholes.cu
$(NVCC)$(NVCCFLAGS)$(EXTRA_NVCCFLAGS)$(GEN code_FLAGS)$(MAXRREGCOUNT)$(含)-o $ @ $<
$(NVCC)-dlink $(GEN code_FLAGS)$(MAXRREGCOUNT)-o bs_link.o $ @BlackScholes_gold.o:BlackScholes_gold.cpp
$(GCC)$(CCFLAGS)$(含)-o $ @ -c $<BlackScholes:BlackScholes.o BlackScholes_gold.o bs_link.o
$(GCC)$(CCFLAGS)-o $ @ $ + $(LDFLAGS)$(EXTRA_LDFLAGS)运行:打造
./BlackScholes
This is my first program using Dynamic Parallelism and I am unable to compile the code. I need to be able to run this for my research project at college and any help will be most appreciated:
I get the following error:
/cm/shared/apps/cuda50/toolkit/5.0.35/bin/nvcc -m64 -dc -gencode arch=compute_35,code=sm_35 -rdc=true -dlink -po maxrregcount=16 -I/cm/shared/apps/cuda50/toolkit/5.0.35 -I. -I.. -I../../common/inc -o BlackScholes.o -c BlackScholes.cu
g++ -m64 -I/cm/shared/apps/cuda50/toolkit/5.0.35 -I. -I.. -I../../common/inc -o BlackScholes_gold.o -c BlackScholes_gold.cpp
g++ -m64 -o BlackScholes BlackScholes.o BlackScholes_gold.o -L/cm/shared/apps/cuda50/toolkit/5.0.35/lib64 -lcudart -lcudadevrt
BlackScholes.o: In function `__sti____cudaRegisterAll_47_tmpxft_000059cb_00000000_6_BlackScholes_cpp1_ii_c58990ec()':
tmpxft_000059cb_00000000-3_BlackScholes.cudafe1.cpp:(.text+0x1354): undefined reference to `__cudaRegisterLinkedBinary_47_tmpxft_000059cb_00000000_6_BlackScholes_cpp1_ii_c58990ec'
collect2: ld returned 1 exit status
make: *** [BlackScholes] Error 1
I have one cpp file, one cu file and one cuh file. Important portions of my makefile are below:
# CUDA code generation flags
#GENCODE_SM10 := -gencode arch=compute_10,code=sm_10
GENCODE_SM20 := -gencode arch=compute_20,code=sm_20
GENCODE_SM30 := -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35
GENCODE_SM35 := -gencode arch=compute_35,code=sm_35
#GENCODE_FLAGS := $(GENCODE_SM10) $(GENCODE_SM20) $(GENCODE_SM30)
GENCODE_FLAGS := $(GENCODE_SM35)
# OS-specific build flags
ifneq ($(DARWIN),)
LDFLAGS := -Xlinker -rpath $(CUDA_LIB_PATH) -L$(CUDA_LIB_PATH) -lcudart -lcudadevrt
CCFLAGS := -arch $(OS_ARCH)
else
ifeq ($(OS_SIZE),32)
LDFLAGS := -L$(CUDA_LIB_PATH) -lcudart -lcudadevrt
CCFLAGS := -m32
else
LDFLAGS := -L$(CUDA_LIB_PATH) -lcudart -lcudadevrt
CCFLAGS := -m64
endif
endif
# OS-architecture specific flags
ifeq ($(OS_SIZE),32)
NVCCFLAGS := -m32 -dc
else
NVCCFLAGS := -m64 -dc
endif
# Debug build flags
ifeq ($(dbg),1)
CCFLAGS += -g
NVCCFLAGS += -g -G
TARGET := debug
else
TARGET := release
endif
# Common includes and paths for CUDA
INCLUDES := -I$(CUDA_INC_PATH) -I. -I.. -I../../common/inc
# Additional parameters
MAXRREGCOUNT := -po maxrregcount=16
# Target rules
all: build
build: BlackScholes
BlackScholes.o: BlackScholes.cu
$(NVCC) $(NVCCFLAGS) $(EXTRA_NVCCFLAGS) $(GENCODE_FLAGS) -rdc=true -dlink $(MAXRREGCOUNT) $(INCLUDES) -o $@ -c $<
BlackScholes_gold.o: BlackScholes_gold.cpp
$(GCC) $(CCFLAGS) $(INCLUDES) -o $@ -c $<
BlackScholes: BlackScholes.o BlackScholes_gold.o
$(GCC) $(CCFLAGS) -o $@ $+ $(LDFLAGS) $(EXTRA_LDFLAGS)
mkdir -p ../../bin/$(OSLOWER)/$(TARGET)
cp $@ ../../bin/$(OSLOWER)/$(TARGET)
enter code here
run: build
./BlackScholes
When using the host linker (g++
) for final linking of your executable, and when using relocatable device code (nvcc -dc
), it's necessary to do an intermediate device code link step.
From the documentation:
If you want to invoke the device and host linker separately, you can do:
nvcc –arch=sm_20 –dc a.cu b.cu
nvcc –arch=sm_20 –dlink a.o b.o –o link.o
g++ a.o b.o link.o –L<path> -lcudart
Since you are specifying -dc
on the compile line, you are getting a compile-only operation (just as if you had specified -c
to g++).
Here's a modified/condensed Makefile
that should show you what is involved:
GENCODE_SM35 := -gencode arch=compute_35,code=sm_35
GENCODE_FLAGS := $(GENCODE_SM35)
LDFLAGS := -L/usr/local/cuda/lib64 -lcudart -lcudadevrt
CCFLAGS := -m64
NVCCFLAGS := -m64 -dc
NVCC := nvcc
GCC := g++
# Debug build flags
ifeq ($(dbg),1)
CCFLAGS += -g
NVCCFLAGS += -g -G
TARGET := debug
else
TARGET := release
endif
# Common includes and paths for CUDA
INCLUDES := -I/usr/local/cuda/include -I. -I..
# Additional parameters
MAXRREGCOUNT := -po maxrregcount=16
# Target rules
all: build
build: BlackScholes
BlackScholes.o: BlackScholes.cu
$(NVCC) $(NVCCFLAGS) $(EXTRA_NVCCFLAGS) $(GENCODE_FLAGS) $(MAXRREGCOUNT) $(INCLUDES) -o $@ $<
$(NVCC) -dlink $(GENCODE_FLAGS) $(MAXRREGCOUNT) -o bs_link.o $@
BlackScholes_gold.o: BlackScholes_gold.cpp
$(GCC) $(CCFLAGS) $(INCLUDES) -o $@ -c $<
BlackScholes: BlackScholes.o BlackScholes_gold.o bs_link.o
$(GCC) $(CCFLAGS) -o $@ $+ $(LDFLAGS) $(EXTRA_LDFLAGS)
run: build
./BlackScholes
这篇关于CUDA动态并行生成文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!