可以同时运行多少个线程(或工作项)? [英] How many threads (or work-item) can run at the same time?

查看:174
本文介绍了可以同时运行多少个线程(或工作项)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是GPGPU编程的新手,并且正在使用NVIDIA OpenCL实施.

I'm new in GPGPU programming and I'm working with NVIDIA implementation of OpenCL.

我的问题是如何计算GPU设备的限制(以线程数为单位).
据我了解,有许多工作组(相当于CUDA中的块),其中包含许多工作项(〜cuda线程).

My question was how to compute the limit of a GPU device (in number of threads).
From what I understood a there are a number of work-group (equivalent of blocks in CUDA) that contain a number of work-item (~ cuda thread).

  • 如何获得卡上存在的工作组数量(并且可以同时运行)和一个工作组中存在的工作项目数量?

  • How do I get the number of work-group present on my card (and that can run at the same time) and the number of work-item present on one work group?

CL_DEVICE_MAX_COMPUTE_UNITS对应什么?
khronos规范突显了内核("OpenCL设备上的并行计算内核数.")与我的图形卡规范中给出的CUDA内核有什么区别.在我的情况下,openCL提供了14个,而我的GeForce 8800 GT具有112个基于NVIDIA网站的内核.

To what CL_DEVICE_MAX_COMPUTE_UNITS corresponds?
The khronos specification speeks of cores ("The number of parallel compute cores on the OpenCL device.") what is the difference with the CUDA core given in the specification of my graphic card. In my case openCL gives 14 and my GeForce 8800 GT has 112 core based on NVIDIA website.

CL_DEVICE_MAX_WORK_GROUP_SIZE(在我的情况下为512)是否对应于分配给特定工作组的工作项总数或可以在工作组中同时运行的工作项数目?

Does CL_DEVICE_MAX_WORK_GROUP_SIZE (512 in my case) corresponds to the total of work-items given to a specific work-group or the number of work-item that can run at the same time in a work-group?

任何建议将不胜感激.

推荐答案

OpenCL标准未指定OpenCL提供的抽象执行模型如何映射到硬件.您可以排队任意数量的线程(工作项)T,并提供工作组大小(WG),至少要具有以下约束(有关详细信息,请参阅OpenCL规范5.7.3和5.8):

The OpenCL standard does not specify how the abstract execution model provided by OpenCL is mapped to the hardware. You can enqueue any number T of threads (work items), and provide a workgroup size (WG), with at least the following constraints (see OpenCL spec 5.7.3 and 5.8 for details):

  • WG必须除以T
  • WG不得超过DEVICE_MAX_WORK_GROUP_SIZE
  • WG最多应为GetKernelWorkGroupInfo返回的KERNEL_WORK_GROUP_SIZE;如果内核消耗大量资源,则它可能小于设备最大工作组的大小.
  • WG must divide T
  • WG must be at most DEVICE_MAX_WORK_GROUP_SIZE
  • WG must be at most KERNEL_WORK_GROUP_SIZE returned by GetKernelWorkGroupInfo ; it may be smaller than the device max workgroup size if the kernel consumes a lot of resources.

该实现管理硬件上内核的执行.一个工作组的所有线程都必须安排在一个多处理器"上,但是一个多处理器可以同时管理多个工作组.

The implementation manages the execution of the kernel on the hardware. All threads of a single workgroup must be scheduled on a single "multiprocessor", but a single multiprocessor can manage several workgroups at the same time.

工作组中的线程由32组(NVIDIA warp)或64组(AMD wavefront)执行.每个微体系结构都以不同的方式执行此操作.您将在NVIDIA和AMD论坛以及每个供应商提供的各种文档中找到更多详细信息.

Threads inside a workgroup are executed by groups of 32 (NVIDIA warp) or 64 (AMD wavefront). Each micro-architecture does this in a different way. You will find more details in NVIDIA and AMD forums, and in the various docs provided by each vendor.

要回答您的问题:线程数没有限制.在现实世界中,您的问题受到输入/输出大小(即设备内存大小)的限制.要处理4GB的float缓冲区,可以排队1G线程,例如WG = 256.该设备将不得不在其少量的多处理器(例如2到40个)上安排4M工作组.

To answer your question: there is no limit to the number of threads. In the real world, your problem is limited by the size of inputs/outputs, i.e. the size of the device memory. To process a 4GB buffer of float, you can enqueue 1G threads, with WG=256 for example. The device will have to schedule 4M workgroups on its small number (say between 2 and 40) of multiprocessors.

这篇关于可以同时运行多少个线程(或工作项)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆