估计OpenCL寄存器的使用 [英] Estimate OpenCL Register Use

查看:80
本文介绍了估计OpenCL寄存器的使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在编译器查看内核并分配寄存器时,是否存在使编译器满意的经验法则?

Is there a rule of thumb for keeping the compiler happy when it looks at a kernel and assigns registers?

编译器具有很大的灵活性,但是我担心如果我在内核中创建了500个变量,或者很长的一行执行大量操作,它可能会开始使用过多的本地内存.

The compiler has a lot of flexibility, but I worry that it might start using excessive local memory if I created like, 500 variables in my kernel... or a very long single line with a ton of operations.

我知道我的程序真正检查特定设备上寄存器使用情况的唯一方法是使用AMD SDK或NVIDIA SDK(或将汇编代码与设备的体系结构进行比较).不幸的是,我使用的是PyOpenCL,因此使用这些SDK是不切实际的.

I know the only way my program could really examine register use on a specific device is by using the AMD SDK or the NVIDIA SDK (or comparing the assembly code to the Device's architecture). Unfortunately, I am using PyOpenCL, so working with those SDKs would be impractical.

我的程序生成半随机内核,并且我试图阻止它执行会阻塞编译器并开始将寄存器转储到本地内存中的事情.

My program generates semi-random kernels, and I'm trying to prevent it from doing things that would choke the compiler and start dumping registers in local memory.

推荐答案

编译器将跟踪私有变量范围,不是您声明重要的变量数量,而是如何使用它们.

The compiler will keep track of the private variables scope, is not the number of variables you declare that matters, but how hey are used.

例如,在下面的示例中,仅使用2个寄存器.尽管使用了5个私有变量:

For example, in the following example, only 2 registers are used. Although, 5 private variables are used:

//Notice here, that a value is used in the register when it has to be stored
// not when it is declared. So declaring a variable that is never used will be
// optimized and removed by the compiler.

R1 | R2 |  Code
 a |  - |  int a = 1;
 a |  b |  int b = 3;
 a |  b |  int c;
 c |  b |  c = a + b;
 c |  b |  int d;
 c |  d |  d = c + b;
 c |  d |  int e;
 e |  - |  int e = c + d;
 - |  - |  out[idx] = e; //Global memory output

这完全取决于变量的范围(何时需要,是否需要以及需要​​多长时间).

It all depends on the scope of the variable (when is it needed, if it is needed, and for how long).

唯一重要的是,如果编译器无法预测该内存,则不会创建超出所需内存的内存.

The only critical thing is NOT to create more memory than needed if the compiler cannot predict that memory.

 int a[100];
 //Initialize a with some value
 int b;
 b = a[global_index];

编译器将无法预测您正在使用的值,因此它需要100个值,并且会在需要时分出内存.对于此类操作,最好创建一个表,甚至只读取一个全局表.

The compiler will not be able to predict the values you are using, therefore it needs the 100 values, and will spil out the memory if needed. For those kind of operations is better to create a table or even do a single reading to a global table.

这篇关于估计OpenCL寄存器的使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆