更改CUDA中的拱参数使我使用更多的寄存器 [英] Changing the arch argument in CUDA makes me use more registers
问题描述
我在我的Tesla K20m上编写一个内核,当我用-Xptas = -v编译软件时,我得到以下结果:
I have been writing a kernel on my Tesla K20m, when I compile the software with -Xptas=-v I obtain the following results :
ptxas info : 0 bytes gmem
ptxas info : Compiling entry function '_Z9searchKMPPciPhiPiS1_' for 'sm_10'
ptxas info : Used 8 registers, 80 bytes smem, 8 bytes cmem[1]
你可以看到,只使用了8个寄存器,但是,如果我提到参数-arch = sm_35我的内核执行的时间急剧增加和使用的寄存器数量,我想知道为什么
as you can see, only 8 registers are used, however, if I mention the argument -arch=sm_35 the time my kernel executes raises dramatically and the number of registers used too, and I am wondering why
nvcc mysoftware.cu -Xptxas=-v -arch=sm_35
ptxas info : 0 bytes gmem
ptxas info : Compiling entry function '_Z9searchKMPPciPhiPiS1_' for 'sm_35'
ptxas info : Function properties for _Z9searchKMPPciPhiPiS1_
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 21 registers, 16 bytes smem, 368 bytes cmem[0]
由于在多本书中提到使用正确的卡片架构是为了提高表演,我不知道为什么我的显着减少。
Since in multiple books it was mentioned that using the right architecture for the card was suppose to improve the performances, I wonder why mine are dramatically decreasing.
谢谢。
Edit : Similar Question and Answer : Registers and shared memory depending on compiling compute capability?
推荐答案
使用sm_20及以上版本编译可实现IEEE数学和ABI兼容性。这两个选项可以增加寄存器计数和降低性能。这两个选项可以禁用。
Compiling with sm_20 and above enables IEEE math and ABI compliance. These two options can increase register count and decrease performance. These two options can be disabled.
这篇关于更改CUDA中的拱参数使我使用更多的寄存器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!