CL_DEVICE_LOCAL_MEM_SIZE是整个设备还是每个工作组? [英] Is CL_DEVICE_LOCAL_MEM_SIZE for the entire device, or per work-group?

查看:144
本文介绍了CL_DEVICE_LOCAL_MEM_SIZE是整个设备还是每个工作组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不太清楚CL_DEVICE_LOCAL_MEM_SIZE的实际含义,它是通过clGetDeviceInfo函数获取的.这个值是表示某个设备上所有可用本地内存的总和,还是工作组中本地内存共享的上限?

I'm not quite clear of the actual meaning of CL_DEVICE_LOCAL_MEM_SIZE, which is acquired through clGetDeviceInfo function. Is this value indicating the total sum of all the available local memory on a certain device, or the up-limit of local memory share to a work-group?

推荐答案

TL; DR:每个单个处理单元,因此也是分配给工作单元的最大数量.

此值是设备中每个计算单元上可用的本地内存量.由于工作组被分配给单个计算单元,因此这也是任何工作组可以拥有的最大本地内存量.

This value is the amount of local memory available on each compute unit in the device. Since a work-group is assigned to a single compute unit, this is also the maximum amount of local memory that any work-group can have.

出于许多GPU上的性能原因,通常希望在每个计算单元上同时运行多个工作组(例如,以隐藏内存访问延迟).如果一个工作组使用了所有可用的本地内存,则该设备将无法将任何其他工作组安排在同一计算单元上,直到完成为止.如果可能的话,建议限制每个工作组使用的本地内存量(例如,占总本地内存的四分之一),以允许多个工作组同时在同一计算单元上运行.

For performance reasons on many GPUs, it is usually desirable to have multiple work-groups running on each compute unit concurrently (to hide memory access latency, for example). If one work-group uses all of the available local memory, the device will not be able to schedule any other work-groups onto the same compute unit until it has finished. If possible, it is recommended to limit the amount of local memory each work-group uses (to e.g. a quarter of the total local memory) to allow multiple work-groups to run on the same compute unit concurrently.

这篇关于CL_DEVICE_LOCAL_MEM_SIZE是整个设备还是每个工作组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆