OpenCL和CUDA注册用法优化 [英] OpenCL and CUDA registers usage optimization
问题描述
我目前正在编写OpenCL内核(但我想在CUDA中将是相同的),并且目前我正在尝试针对NVidia GPU进行优化.
I'm currently writing an OpenCL kernel (but I suppose that in CUDA in will be the same), and currently I try to optimize for NVidia GPU.
我目前在内核中使用63个寄存器,该内核非常大,因此它使用了所有GPU寄存器.我正在寻找以下方法:
I currently use 63 registers in my kernel, this kernel is very big and so it use all the GPU registers. I'm looking for some way to:
1)看看哪些变量在寄存器中,然后哪些在全局存储器中(因为如果我没有足够的寄存器,似乎编译器会将变量保存在全局存储器中).
1) See which variables are in registers and which are then in global memory (Because if I have not enough registers it seems the compiler save the variables in global memory).
2)有没有一种方法可以指定哪个变量更重要(或者哪个变量应该在寄存器中).因为我使用了一些存在的变量,但使用较少.一种赋予优先权的方法?
2) Is there a way to specify which variable is more important (or which should be in registers). Because I use some variables that are present but less used. A way to give priority ?
当我们已经使用了所有寄存器时,还有其他优化策略吗?
Is there other optimization strategy when we already use all the registers ?
BTW:我也尝试读取PTX代码并搜索所有".reg"关键字,但是问题是PTX不可读,我不知道代码中哪个寄存器用于哪个变量.我找不到任何对应的方法!
BTW : I have also try to read the PTX code and search for all the ".reg" keywords but the problem is that the PTX is unreadable, I don't know which register is used for which variable in my code. I have'nt find any way to have the correspondance !
谢谢
推荐答案
查看哪些变量在寄存器中,然后哪些在全局中 记忆
See which variables are in registers and which are then in global memory
为此,我不知道如何检查它
For this i do not know the way how to check it, however
有没有一种方法可以指定哪个变量更重要
Is there a way to specify which variable is more important
当我看到我的寄存器溢出时(由于缺少寄存器或当我需要在本地var中使用动态索引,这是很糟糕的)时,我使用的一个技巧是显式地存储那些,我认为并非如此关键,进入本地内存(在CUDA中称为共享")
One trick that i use when i see that i have spilled registers (due to lack of them or when i need to use dynamic indexing in local vars, which is bad) is to explicitly store ones, that i think are not so critical, into local memory (called "shared" in CUDA)
例如之前:
uint16 somedata;
之后:
__local uint16 somedata[WG_SIZE]; // or __local uint someadata[16];
但是要注意,如果您的本地内存使用量将大大增加,则由于机上波前的数量会减少(例如,占用率可能会降低),您有遭受罚款的风险
but beware that if your local memory usage will be greatly increased you are risking to have penalty because number of inflight wavefronts will be less ( i.e. you might have lower occupancy)
希望这会有所帮助.
这篇关于OpenCL和CUDA注册用法优化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!