何时将volatile与寄存器/局部变量一起使用 [英] When to use volatile with register/local variables

查看:269
本文介绍了何时将volatile与寄存器/局部变量一起使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在CUDA中使用易失性限定符声明寄存器数组是什么意思?



当我尝试将volatile关键字与寄存器数组一起使用时,它删除了溢出的寄存器数内存到本地内存。 (即强制CUDA使用寄存器而不是本地内存)这是预期的行为吗?



我没有找到有关volatile的寄存器阵列使用方面的任何信息。



这是两个版本的ptxas -v输出



带有易变限定符

  __volatile__浮点数组[32]; 

ptxas -v输出

  ptxas信息:为``sm_20''编译入口函数'_Z2swPcS_PfiiiiS0_'
ptxas信息:_Z2swPcS_PfiiiiS0_的函数属性$ 88 $堆栈帧,0字节溢出存储,0字节溢出加载b $ b ptxas info:使用47个寄存器,16640字节smem,80字节cmem [0],8字节cmem [16]

没有波动限定符

 浮点数组[32]; 

ptxas -v输出

  ptxas信息:为``sm_20''编译入口函数'_Z2swPcS_PfiiiiS0_'
ptxas信息:_Z2swPcS_PfiiiiS0_的函数属性$ 96堆栈帧,100字节溢出存储,108字节溢出加载
ptxas info:使用51个寄存器,16640字节smem,80字节cmem [0],8字节cmem [16]


解决方案

volatile 限定符向编译器指定对变量(读或写)的所有引用均应导致内存引用,并且这些引用必须按照程序中指定的顺序。 volatile 限定符的用法在Shane Cook的书 CUDA编程的第12章中进行了说明。



使用 volatile 可以避免编译器可以进行的某些优化,因此可以更改已使用的寄存器数。理解 volatile 实际作用的最好方法是使用和不使用限定符来分解相关的 __ global __ 函数。 / p>

确实考虑以下内核函数

  __ global__ void volatile_test(){ 

浮动浮点a [3]; (int i = 0; i <3; i ++)a [i] =(float)i;
}

__global__ void no_volatile_test(){

float a [3]; (int i = 0; i <3; i ++)a [i] =(float)i;
}

反汇编上面获得的内核函数


$ b_b

  sm_20 
的代码功能:_Z16no_volatile_testv
.headerflags @ EF_CUDA_SM20 EF_CUDA_PTX_SM(EF_CUDA_SM20)
/ * 0000 * / MOV R1 ,c [0x1] [0x100]; / * 0x2800440400005de4 * /
/ * 0008 * / EXIT; / * 0x8000000000001de7 * /


函数:_Z13volatile_testv
.headerflags @ EF_CUDA_SM20 EF_CUDA_PTX_SM(EF_CUDA_SM20)
/ * 0000 * / MOV R1,c [0x1] [0x100]; / * 0x2800440400005de4 * /
/ * 0008 * / ISUB R1,R1,0x10; / * 0x4800c00040105d03 * / R1 = a [0]
的地址/ * 0010 * / MOV32I R2,0x3f800000; / * 0x18fe000000009de2 * / R2 = 1
/ * 0018 * / MOV32I R0,0x40000000; / * 0x1900000000001de2 * / R0 = 2
/ * 0020 * / STL [R1],RZ; / * 0xc8000000001fdc85 * /
/ * 0028 * / STL [R1 + 0x4],R2; / * 0xc800000010109c85 * / a [0] = 0;
/ * 0030 * / STL [R1 + 0x8],R0; / * 0xc800000020101c85 * / a [1] = R2 = 1;
/ * 0038 * / EXIT; / * 0x8000000000001de7 * / a [2] = R0 = 2;

如您所见,当不使用 volatile 关键字,编译器意识到已设置 a 但从未使用过(实际上,编译器返回以下警告:已设置变量 a但从未使用过),几乎没有反汇编的代码。



与此相反,当使用 volatile 关键字时,所有对 a 的引用都转换为内存引用(在这种情况下为写操作)。


What is the meaning of declaring register arrays in CUDA with volatile qualifier?

When I tried with volatile keyword with a register array, it removed the number of spilled register memory to local memory. (i.e. Force the CUDA to use registers instead of local memory) Is this the intended behavior?

I did not find any information about the usage of volatile with regard to register arrays in CUDA documentation.

Here is the ptxas -v output for both versions

With volatile qualifier

    __volatile__ float array[32];

ptxas -v output

ptxas info    : Compiling entry function '_Z2swPcS_PfiiiiS0_' for 'sm_20'
ptxas info    : Function properties for _Z2swPcS_PfiiiiS0_
88 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 47 registers, 16640 bytes smem, 80 bytes cmem[0], 8 bytes cmem[16]

Without volatile qualifier

    float array[32];

ptxas -v output

ptxas info    : Compiling entry function '_Z2swPcS_PfiiiiS0_' for 'sm_20'
ptxas info    : Function properties for _Z2swPcS_PfiiiiS0_
96 bytes stack frame, 100 bytes spill stores, 108 bytes spill loads
ptxas info    : Used 51 registers, 16640 bytes smem, 80 bytes cmem[0], 8 bytes cmem[16]

解决方案

The volatile qualifier specifies to the compiler that all references to a variable (read or write) should result in a memory reference and those references must be in the order specified in the program. The use of the volatile qualifier is illustrated in Chapter 12 of the Shane Cook book, "CUDA Programming".

The use of volatile will avoid some optimizations the compiler can do and so change the number of used registers used. The best way to understand what volatile is actually doing is to disassemble the relevant __global__ function with and without the qualifier.

Consider indeed the following kernel functions

__global__ void volatile_test() {

   volatile float a[3];

   for (int i=0; i<3; i++) a[i] = (float)i;
}

__global__ void no_volatile_test() {

   float a[3];

   for (int i=0; i<3; i++) a[i] = (float)i;
}

Disassembling the above kernel functions one obtains

code for sm_20
      Function : _Z16no_volatile_testv
.headerflags    @"EF_CUDA_SM20 EF_CUDA_PTX_SM(EF_CUDA_SM20)" 
/*0000*/        MOV R1, c[0x1][0x100]; /* 0x2800440400005de4 */
/*0008*/        EXIT ;                 /* 0x8000000000001de7 */


      Function : _Z13volatile_testv
.headerflags    @"EF_CUDA_SM20 EF_CUDA_PTX_SM(EF_CUDA_SM20)"
/*0000*/        MOV R1, c[0x1][0x100]; /* 0x2800440400005de4 */   
/*0008*/        ISUB R1, R1, 0x10;     /* 0x4800c00040105d03 */   R1 = address of a[0]
/*0010*/        MOV32I R2, 0x3f800000; /* 0x18fe000000009de2 */   R2 = 1
/*0018*/        MOV32I R0, 0x40000000; /* 0x1900000000001de2 */   R0 = 2
/*0020*/        STL [R1], RZ;          /* 0xc8000000001fdc85 */
/*0028*/        STL [R1+0x4], R2;      /* 0xc800000010109c85 */   a[0] = 0;
/*0030*/        STL [R1+0x8], R0;      /* 0xc800000020101c85 */   a[1] = R2 = 1;
/*0038*/        EXIT ;                 /* 0x8000000000001de7 */   a[2] = R0 = 2;

As you can see, when NOT using the volatile keyword, the compiler realizes that a is set but never used (indeed, the compiler returns the following warning: variable "a" was set but never used) and there is practically no disassembled code.

Opposite to that, when using the volatile keyword, all references to a are translated to memory references (write in this case).

这篇关于何时将volatile与寄存器/局部变量一起使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆