为什么首选工作组的大小是内核属性的多个部分? [英] Why is preferred work group size multiple part of Kernel properties?

查看:66
本文介绍了为什么首选工作组的大小是内核属性的多个部分?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

据我了解,首选的工作组大小大致取决于计算设备的SIMD宽度(对于NVidia,这是Warp大小,在AMD上是Wavefront).

From what I understand, the preferred work group size is roughly dependent on the SIMD width of a compute device (for NVidia, this is the Warp size, on AMD the term is Wavefront).

从逻辑上讲,这会使人们假设首选工作组的大小与设备有关,而不与内核有关.但是,必须使用CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE相对于特定内核来查询此属性.选择一个不是基础硬件设备SIMD宽度的倍数的值将不会完全加载硬件,从而导致性能降低,并且应该与正在执行的内核无关.

Logically that would lead one to assume that the preferred work group size is device dependent, not kernel dependent. However, to query this property must be done relative to a particular kernel using CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE. Choosing a value which isn't a multiple of the underlying hardware device SIMD width would not completely load the hardware resulting in reduced performance, and should be regardless of what kernel is being executed.

我的问题是为什么不是这样?当然,这个设计决定不是完全武断的.是否存在一些潜在的实现限制,或者在某些情况下此属性确实应该是内核属性?

My question is why is this not the case? Surely this design decision wasn't completely arbitrary. Is there some underlying implementation limitations, or are there cases where this property really should be a kernel property?

推荐答案

通读OpenCL 1.2规范的6.7.2节后,我发现允许内核提供编译器属性,这些属性使用以下方法指定必需或推荐的工作大小提示__attribute__关键字.仅当首选工作组大小倍数是内核属性与设备属性时,才能将该属性传递给主机.

After reading through section 6.7.2 of the OpenCL 1.2 specifications, I found that a kernel is allowed to provide compiler attributes which specify either required or recommended worksize hints using the __attribute__ keyword. This property can only be passed to the host if the preferred work group size multiple is a kernel property vs. a device property.

理论上最好的工作组大小选择可能是特定于设备的属性,但不一定适用于特定内核,或者根本不适用.例如,最有效的方法可能是2*CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE的倍数或全部加起来.

The theoretical best work-group size choice may be a device-specific property, but it won't necessarily work best for a specific kernel, or at all. For example, what works best may be a multiple of 2*CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE or something all-together.

这篇关于为什么首选工作组的大小是内核属性的多个部分?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆