OpenCL和CUDA注册用法优化 [英] OpenCL and CUDA registers usage optimization

查看：121 发布时间：2020/5/21 21:28:20 optimization cuda opencl

本文介绍了OpenCL和CUDA注册用法优化的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我目前正在编写OpenCL内核(但我想在CUDA中将是相同的)，并且目前我正在尝试针对NVidia GPU进行优化.

I'm currently writing an OpenCL kernel (but I suppose that in CUDA in will be the same), and currently I try to optimize for NVidia GPU.

我目前在内核中使用63个寄存器，该内核非常大，因此它使用了所有GPU寄存器.我正在寻找以下方法:

I currently use 63 registers in my kernel, this kernel is very big and so it use all the GPU registers. I'm looking for some way to:

1)看看哪些变量在寄存器中，然后哪些在全局存储器中(因为如果我没有足够的寄存器，似乎编译器会将变量保存在全局存储器中).

1) See which variables are in registers and which are then in global memory (Because if I have not enough registers it seems the compiler save the variables in global memory).

2)有没有一种方法可以指定哪个变量更重要(或者哪个变量应该在寄存器中).因为我使用了一些存在的变量，但使用较少.一种赋予优先权的方法?

2) Is there a way to specify which variable is more important (or which should be in registers). Because I use some variables that are present but less used. A way to give priority ?

当我们已经使用了所有寄存器时，还有其他优化策略吗?

Is there other optimization strategy when we already use all the registers ?

BTW:我也尝试读取PTX代码并搜索所有".reg"关键字，但是问题是PTX不可读，我不知道代码中哪个寄存器用于哪个变量.我找不到任何对应的方法！

BTW : I have also try to read the PTX code and search for all the ".reg" keywords but the problem is that the PTX is unreadable, I don't know which register is used for which variable in my code. I have'nt find any way to have the correspondance !

谢谢

推荐答案

查看哪些变量在寄存器中，然后哪些在全局中记忆

See which variables are in registers and which are then in global memory

为此，我不知道如何检查它

For this i do not know the way how to check it, however

有没有一种方法可以指定哪个变量更重要

Is there a way to specify which variable is more important

当我看到我的寄存器溢出时(由于缺少寄存器或当我需要在本地var中使用动态索引，这是很糟糕的)时，我使用的一个技巧是显式地存储那些，我认为并非如此关键，进入本地内存(在CUDA中称为共享")

One trick that i use when i see that i have spilled registers (due to lack of them or when i need to use dynamic indexing in local vars, which is bad) is to explicitly store ones, that i think are not so critical, into local memory (called "shared" in CUDA)

例如之前:

uint16 somedata;

之后:

__local uint16 somedata[WG_SIZE]; // or __local uint someadata[16];

但是要注意，如果您的本地内存使用量将大大增加，则由于机上波前的数量会减少(例如，占用率可能会降低)，您有遭受罚款的风险

but beware that if your local memory usage will be greatly increased you are risking to have penalty because number of inflight wavefronts will be less ( i.e. you might have lower occupancy)

希望这会有所帮助.

这篇关于OpenCL和CUDA注册用法优化的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

OpenCL和CUDA注册用法优化 [英] OpenCL and CUDA registers usage optimization

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

OpenCL和CUDA注册用法优化 [英] OpenCL and CUDA registers usage optimization

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭