OpenCL:工作项,处理元素,NDRange [英] OpenCL: Work items, Processing elements, NDRange

查看:197
本文介绍了OpenCL:工作项,处理元素,NDRange的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我和我的同学们都第一次遇到OpenCL.不出所料,我们遇到了一些问题.下面,我总结了我们遇到的问题和找到的答案.但是,我们不确定是否能解决所有问题,因此,如果你们能够同时查看我们的答案和下面的问题,那就太好了.

My classmates and me are being confronted with OpenCL for the first time. As expected, we ran into some issues. Below I summarized the issues we had and the answers we found. However, we're not sure that we got it all right, so it would be great if you guys could take a look at both our answers and the questions below them.

我们为什么不将其分解为单个问题?

Why didn't we split that up into single questions?

  1. 它们部分相互关联.
  2. 我们认为这些是典型的初学者的问题.那些我们咨询过的同学都回答:嗯, 我也不明白."
  1. They partly relate to each other.
  2. We think these are typical beginner's questions. Those fellow students who we consulted all replied "Well, that I didn't understand either."

工作项与处理元素

在我所见过的大多数有关OpenCL的讲座中,他们使用相同的插图介绍计算单元和处理元素以及工作组和工作项.这导致我和我的同学们不断混淆这些概念.因此,我们现在提出了一个定义,强调了以下事实:处理元素与工作项非常不同:

Work items vs. Processing elements

In most of the lectures on OpenCL that I have seen, they use the same illustration to introduce computing units and processing elements as well as work groups and work items. This has led my classmates and me to continuously confuse these concepts. Therefore we now came up with a definition that emphasizes on the fact that processing elements are very different from work items:

  • 工作项是正在执行的内核,而处理元素是代表实际执行计算的抽象模型.工作项是仅临时存在于软件中的事物,而处理元素则抽象出物理上存在于硬件中的事物.但是,取决于硬件,因此取决于OpenCL的实现,工作项可能会映射到由所谓的处理元素表示的某些硬件并由其执行.

问题1:这是正确的吗?有没有更好的表达方式?

Question 1: Is this correct? Is there a better way to express this?

这就是我们对 NDRange 的概念的理解:

This is how we perceive the concept of NDRange:

  • NDRange大小表示外面的工作项数量.通常,这也称为全局大小.但是,NDRange可以是一维,二维或三维("ND"):
      一维问题是线性向量的一些计算.如果向量的大小为64,并且有64个工作项要处理该向量,则NDRange的大小等于64. 二维问题将是对图像的一些计算.在1024x768图像的情况下,NDRange大小Gx将为1024,而NDRange大小Gy将为768.这假定那里有1024x768个工作项来处理该图像的每个像素.然后,NDRange大小等于1024x768. 三维示例是对3D模型的某种计算.此外,还有NDRange大小Gz.
    • The amount of work items that are out there is being represented by the NDRange size. Commonly, this is also being referred to as the global size. However, the NDRange can be either one-, two-, or three-dimensional ("ND"):
      • A one-dimensional problem would be some computation an a linear vector. If the vector's size is 64 and there are 64 work items to process that vector, then the NDRange size equals 64.
      • A two-dimensional problem would be some computation on an image. In the case of an 1024x768 image, the NDRange size Gx would be 1024 and the NDRange size Gy would be 768. This assumes, that there are 1024x768 work items out there to process each pixel of that image. The NDRange size then equals 1024x768.
      • A three-dimensional example would be some computation on a 3D model or so. Additionally, there is NDRange size Gz.

      问题2:再次正确吗?

      问题3:这些尺寸只是为了方便起见,对吗?可以简单地将图像的每个像素的颜色值存储在大小为width * height的线性向量中.对于任何3D问题也是如此.

      Question 3: These dimensions are simply out there for convienence right? One could simply store the color values of each pixel of an image in a linear vector of the size width * height. The same is true for any 3D problem.

      问题4:有人告诉我们,可以使用barrier(CLK_LOCAL_MEM_FENCE); Understood在工作组内同步内核(即工作项)的执行.我们也(反复)被告知无法同步工作组.好吧.但是barrier(CLK_GLOBAL_MEM_FENCE);有什么用?

      Question 4: We were being told that the execution of kernels (in other words: work items) could be synchronized within a work group using barrier(CLK_LOCAL_MEM_FENCE); Understood. We were also (repeatedly) being told that work groups cannot be synchronized. Alright. But then what's the use of barrier(CLK_GLOBAL_MEM_FENCE);?

      问题5::在我们的宿主程序中,我们指定了一个上下文,该上下文由一个或多个可用平台中的一个或多个设备组成.但是,我们只能将内核排入所谓的命令队列中,该命令队列恰好链接到一个设备(必须在上下文中).再次:命令队列不链接到先前定义的上下文,而是链接到单个设备.对吧?

      Question 5: In our host program, we specify a context that consists of one or more device(s) from one of the available platforms. However, we can only enqueue kernels in a so-called command queue that is linked to exactly one device (that has to be in the context). Again: The command queue is not linked to the previously defined context, but to a single device. Right?

      推荐答案

      问题1:几乎正确.工作项是内核的实例(请参阅标准第3.2节的第2段).另请参阅标准中处理元素的定义:

      Question 1: Almost correct. A work-item is an instance of a kernel (see paragraph 2 of section 3.2 of the standard). See also the definition of processing element from the standard:

      处理元素:虚拟标量处理器.一个工作项目可能 在一个或多个处理元素上执行.

      Processing Element: A virtual scalar processor. A work-item may execute on one or more processing elements.

      另请参阅我提供给该问题的答案问题

      see also the answer I provided to that question.

      问题2& 3:使用多个维度或使用与您要处理的数据元素数量完全相同的工作项取决于您的问题.这取决于您,开发的难易程度.还请注意,您在ocl 1.2或更高版本中有一个约束,低于此约束会强制您将全局大小设置为工作组大小的倍数(已在ocl 2.0中删除).

      Question 2 & 3: Use more than one dimensions or the exact same number of work-items than you have data elements to process depends on your problem. It's up to you and how easier the development would be. Note also that you have a constrain with ocl 1.2 and below which forces you to have the global size a multiple of the work-group size (removed with ocl 2.0).

      问题4:是的,由于壁垒,内核执行期间的同步只能在工作组内进行.您作为参数传递的标志之间的差异是指内存的类型.使用CLK_LOCAL_MEM_FENCE,所有工作项都必须确保必须写入本地内存的数据对其他任务可见.与CLK_GLOBAL_MEM_FENCE相同,但对于全局内存

      Question 4: Yes, synchronization during the execution of a kernel is only possible within a work-group thanks to barriers. The difference between the flags you pass as parameter refer to the type of memory. With CLK_LOCAL_MEM_FENCE all work-items will have to make sure that data they have to write in local memory will be visible to the others. With CLK_GLOBAL_MEM_FENCE it's the same but for global memory

      问题5::在上下文中,您可以有多个设备,这些设备本身具有多个命令队列.如前所述,命令队列链接到一台设备,但是您可以将内核放入来自不同设备的不同命令队列中.请注意,如果两个命令队列尝试访问同一内存对象(没有同步),则会出现未定义的行为.当它们各自的作业不相关时,通常会使用两个或多个命令队列.

      Question 5: Within a context you can have several devices having themselves several command queues. As you stated, a command-queue is linked to one device, but you can enqueue your kernels in different command-queues from different devices. Note that if two command-queues try to access the same memory object (without sync) you get an undefined behavior. You'd typically use two or more command queues when their respective jobs are not related.

      但是,您可以通过事件同步命令队列,事实上,您也可以创建自己的事件(称为用户事件),请参阅5.9节和5.10节(标准).

      However you can synchronized command-queues through events and as a matter of fact you can also create your own events (called user events) see section 5.9 for event and section 5.10 for user events (of the standard).

      我建议您至少阅读该标准的第一章(第1至5章).如果您着急的话,至少应该是第2章,它实际上是词汇表.

      I'd advice you to read at least the first chapters (1 to 5) of the standard. If you're in a hurry, at least the chap 2 which is actually the glossary.

      这篇关于OpenCL:工作项,处理元素,NDRange的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆