寄存器和共享内存取决于编译计算能力? [英] Registers and shared memory depending on compiling compute capability?

查看:271
本文介绍了寄存器和共享内存取决于编译计算能力?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我用 nvcc -arch = sm_13 编译时,
我得到:

  ptxas info:使用29个寄存器,28 + 16字节smem,7200字节cmem [0],8字节cmem [1] 

当我使用 nvcc -arch = sm_20 时,我得到:

  ptxas信息:使用34个寄存器,60字节cmem [0],7200字节cmem [2],4字节cmem [16] 

我认为所有的内核参数都被传递给共享内存,但对于sm_20来说,它似乎并不是这样。
也许他们也被传入寄存器?我的函数头像如下:

  __ global__ void func(double *,double,double,int)$ b $ 

解决方案

div>

在计算能力2.x设备中,内核的参数存储在常量内存中。寄存器差异可能是由于版本之间的数学库函数生成的代码差异。在内核中是否有超越函数或 sqrt


Hey there, when I compile with nvcc -arch=sm_13 I get:

ptxas info    : Used 29 registers, 28+16 bytes smem, 7200 bytes cmem[0], 8 bytes cmem[1] 

when I use nvcc -arch=sm_20 I get:

ptxas info    : Used 34 registers, 60 bytes cmem[0], 7200 bytes cmem[2], 4 bytes cmem[16] 

I thought all the kernel parameters are passed to shared memory but for sm_20, it doesn't seem so...?! Perhaps they are also passed into registers? The head of my function looks like the following:

__global__ void func(double *, double , double, int)

Thanks so far!

解决方案

In compute capability 2.x devices, arguments to kernels are stored in constant memory. The register difference is probably down to differences in the code generated for math library functions between versions. Are there things like transcendental functions or sqrt in the kernel?

这篇关于寄存器和共享内存取决于编译计算能力?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆