OpenCL 何时使用全局、私有、本地、常量地址空间 [英] OpenCL When to use global, private, local, constant address spaces

查看:53
本文介绍了OpenCL 何时使用全局、私有、本地、常量地址空间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试学习 OpenCL,但我很难决定使用哪些地址空间,因为我只找到声明这些地址空间是什么的组装资源,而不是它们为什么存在或何时使用它们.资源至少太分散了,所以我希望通过这个问题来收集所有这些信息:所有地址空间是什么,它们为什么存在,何时使用哪个地址空间以及有关内存的优缺点是什么和性能.

据我所知(可能过于简化),GPU 有两种物理类型的内存:全局内存,与实际的处理器相距甚远,速度慢但相当大,可供所有工作人员使用和本地内存,接近实际处理器,速度快但体积小,其他工作人员无法访问.

直观地说,local 限定符确保将变量放置在本地内存中,而 global 限定符确保将变量放置在全局内存中,尽管我不是确定这正是发生的事情.这留下了 privateconstant 限定符.这样做的目的是什么?

还有一些隐式限定符.例如,规范提到通用地址空间,我认为用于没有限定符的参数.这究竟是做什么的?然后还有局部函数变量.那些地址空间是什么?

这是一个使用我的直觉的例子,但不知道我实际上在做什么:

示例:假设我将一个类型为 long 和长度为 10000 的数组传递给我仅用于读取的内核,然后我将声明它global const 因为它必须对所有人可用工人,它不会改变.为什么我不使用 constant 限定符?当通过 CPU 为该数组设置缓冲区时,我实际上也可以将数组设为只读,在我看来,这与声明它const 相同.再说一次,我何时以及为什么要声明constantglobal const?

在执行内存密集型任务时,将数组复制到内核内部的本地数组会更好吗?我的猜测是本地内​​存太小了,但是如果数组的长度只有 10 呢?数组什么时候会太大/太小?更一般的:什么时候值得将数据从全局内存复制到本地内存?

说我也想传递这个数组的长度,然后我将 const int length 添加到我的内核的参数中,但我不确定为什么我会省略 global 限定词,因为我见过其他人这样做.毕竟,length 必须可供所有工人访问.如果我是对的,那么 length 将有一个通用的地址空间,但同样,我真的不知道这意味着什么.

我希望有经验的人可以解决这个问题.这不仅对我来说很棒,而且我希望对其他想要获得有关 GPU 内存管理的实用知识的爱好者也有帮助.

解决方案

常量:一小部分缓存的全局内存对所有工作人员可见.如果可以,请使用它,只读.

全局:缓慢,所有人都可以看到,读或写.您的所有数据都将在此处结束,因此始终需要对其进行一些访问.

本地:您需要在本地群组中分享一些内容吗?使用本地!您所有的本地工作人员都访问相同的全局内存吗?使用本地!本地内存仅在本地 worker 内部可见,大小有限,但速度非常快.

私有:仅对工作人员可见的内存,将其视为寄存器.默认情况下,所有未定义的值都是私有的.

<小时><块引用>

假设我将一个类型为 long 和长度为 10000 的数组传递给我的内核只会用于阅读,然后我会声明它为全局常量,因为它必须可供所有工人使用,并且不会改变.我为什么不用常量限定符?

实际上,是的,您可以并且应该使用 constant 限定符.它将您的数据放在常量内存上(所有工作人员都可以快速访问的一小部分只读内存).这被 GPU 用来将制服传输到所有顶点着色器.

<块引用>

当通过 CPU 为这个数组设置缓冲区时,我实际上也只是可以使阵列只读,在我看来,这表示与将其声明为 const 相同.再说一次,我什么时候以及为什么要声明常量或全局常量?

不是真的,当你创建一个只读缓冲区时,你只是指定 OpenCL 你打算使用它只读,所以它可以在后面做优化,但你实际上可以从内核写入它.global const 只是对开发者的一个保障,所以你不要不小心写入它,它会在编译时报错.基本上,与普通 C 主机端计算相同.如果所有内存都是非常量的,程序也能正常工作.

<块引用>

在执行内存密集型任务时,将数组复制到内核内部的本地数组会更好吗?我的猜测是本地内​​存太小了,但是如果数组的长度只有 10 呢?数组什么时候会太大/太小?更一般的:什么时候值得将数据从全局内存复制到本地内存?

只有在所有工人都阅读时才有价值.如果每个 worker 读取全局内存的单个值,那么它是不值得的.在这里有用:

Worker0 ->读取 0,1,2,3工人 1 ->读取 0,1,2,3工人 2 ->读取 0,1,2,3工人 3 ->读取 0,1,2,3

这里没有用:

Worker0 ->读取 0工人 1 ->读取 1工人 2 ->阅读 2工人 3 ->阅读 3

<块引用>

说我也想传递这个数组的长度,那我就加上我的内核参数的 const int 长度,但我不确定为什么我将省略全局限定符,除非因为我见过其他人们这样做.毕竟,长度必须可供所有工人使用.如果我是对的,那么长度会有一个通用的地址空间,但同样,我真的不知道那是什么意思.

当您没有在内核参数中指定限定符时,它通常默认为 constant,这是您希望这些小元素能够被所有工作人员快速访问的内容.

OpenCL 编译器对内核参数通常遵循的规则是:如果它只读取并适合常量,则为常量,否则为全局.

I'm trying to learn OpenCL but I'm a having a hard time deciding which address spaces to use, as I only find assembled resources declaring what these address spaces are, but not why they exist or when to use them. The resources are at least too scattered, so with this question I hope to assemble all this information: what are all the address spaces, why do they exist, when to use which address space and what are the advantages and disadvantages regarding memory and performance.

As I understand it (which is probably too simplified), the GPU has two physical types of memory: global memory, far from the actual processors, so slow but pretty big and available to all workers, and local memory, close to the actual processors, so fast but small and not accessible from other workers.

Intuitively, the local qualifier makes sure a variable is placed on local memory and the global qualifier makes sure a variable is placed on global memory, though I'm not sure this is exactly what happens. This leaves the private and constant qualifiers. What's the purpose of those?

There also are some implicit qualifiers. For example, the specifications mention the generic address space, which is used for arguments with no qualifiers, I think. What does this do exactly? Then there also are local function variables. What's the address space for those?

Here is an example using my intuition, but without knowing what I'm actually doing:

Example: Say I pass an array of type long and length 10000 to a kernel which I will only use to read, then I would declare it global const as it must be available to all workers and it will not change. Why wouldn't I use the constant qualifier? When setting the buffer for this array via the CPU, I actually also just could have made the array read-only, which in my eyes says the same as declaring it const. So again, when and why would I declare something constant or global const?

When performing memory-intensive tasks, would it be better to copy the array to a local array inside the kernel? My guess is that local memory would be too small, but what if the array only had a length of 10? When would the array be too big/small? More general: when is it worth copying data from global to local memory?

Say I also want to pass the length of this array, then I would add const int length to the arguments of my kernel, but I'm unsure why I would omit the global qualifier except because I have seen other people do it. After all, length must be accessible for all workers. If I'm right, then length would have a generic address space, but again, I don't really know what that means.

I hope someone with some experience can clear this up. That would be great not only for me, but I hope also for other enthusiasts who want to gain some practical knowledge concerning memory management on the GPU.

解决方案

Constant: A small portion of cached global memory visible by all workers. Use it if you can, read only.

Global: Slow, visible by all, read or write. It is where all your data will end, so some accesses to it are always necessary.

Local: Do you need to share something in a local group? Use local! Do all your local workers access the same global memory? Use local! Local memory is only visible inside local workers, and is limited in size, however is very fast.

Private: Memory that is only visible to a worker, consider it like registers. All non defined values are private by default.


Say I pass an array of type long and length 10000 to a kernel which I will only use to read, then I would declare it global const as it must be available to all workers and it will not change. Why wouldn't I use the constant qualifier?

Actually, yes, you can and you should use constant qualifier. Which places your data on the constant memory (a small portion of read only memory quickly accessible by all workers). This is used by GPUs to transfer uniforms to all vertex shaders.

When setting the buffer for this array via the CPU, I actually also just could have made the array read-only, which in my eyes says the same as declaring it const. So again, when and why would I declare something constant or global const?

Not really, when you create a buffer read only you are only specifiying to OpenCL you plan to use it read only, so it can do optimizations in the back, but you can actually write to it from a kernel. global const is just a safeguard for the developer, so you don't accidentally write to it, it will give an error at compile time. Basically, the same as in plain C host side computing. Programs will also work fine if all memory is non-const.

When performing memory-intensive tasks, would it be better to copy the array to a local array inside the kernel? My guess is that local memory would be too small, but what if the array only had a length of 10? When would the array be too big/small? More general: when is it worth copying data from global to local memory?

It is only worth if it is read by all workers. If each worker reads a single value of the global memory, then it is not worth. Useful here:

Worker0 -> Reads 0,1,2,3
Worker1 -> Reads 0,1,2,3
Worker2 -> Reads 0,1,2,3
Worker3 -> Reads 0,1,2,3

Not useful here:

Worker0 -> Reads 0
Worker1 -> Reads 1
Worker2 -> Reads 2
Worker3 -> Reads 3

Say I also want to pass the length of this array, then I would add const int length to the arguments of my kernel, but I'm unsure why I would omit the global qualifier except because I have seen other people do it. After all, length must be accessible for all workers. If I'm right, then length would have a generic address space, but again, I don't really know what that means.

When you don't specify a qualifier in the kernel parameter it typically defaults to constant, which is what you want for those small elements, to have a fast access by all workers.

The rules normally OpenCL compilers follow for kernel parameters is: if it only read and fits in constant, constant, otherwise global.

这篇关于OpenCL 何时使用全局、私有、本地、常量地址空间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆