6.4.2工作组/子组级别的功能
6.4.2 Workgroup/subgroup-level functions
OpenCL 2.0引入了Khronos 子组扩展.子组是
类似于硬件SIMD执行模型的逻辑抽象
波前,扭曲或矢量,并允许更靠近
与供应商无关的硬件.此扩展包括一组
跨子组内置函数的集合
上面指定的跨工作组内置函数.
OpenCL 2.0 introduces a Khronos sub-group extension. Sub-groups are a
logical abstraction of the hardware SIMD execution model akin to
wavefronts, warps, or vectors and permit programming closer to the
hardware in a vendor-independent manner. This extension includes a set
of cross-sub-group built-in functions that match the set of the
cross-work-group built-in functions specified above.
推荐答案
他们必须采用一种称为sub-group
的更具动态性的方法:
They must have gone to a more dynamical approach called sub-group
: https://www.khronos.org/registry/OpenCL/specs/opencl-2.2.pdf
Sub-group: Sub-groups are an implementation-dependent grouping of work-items within a
work-group. The size and number of sub-groups is implementation-defined.
和
Work-groups are further divided into sub-groups,
which provide an additional level of control over execution.
和
The mapping of work-items to
sub-groups is implementation-defined and may be queried at runtime.
所以即使它不被称为wavefront,它现在也可以在运行时和
so even if its not called wavefront, its now queryable in run-time and
在没有同步功能(例如障碍)的情况下,
子组中的工作项可以序列化.在......的存在下
子组功能,子组内的工作项可以序列化
在任何给定的子组功能之前,在动态遇到
成对的子组功能以及工作组功能之间
内核的末尾.
In the absence of synchronization functions (e.g. a barrier),
work-items within a sub-group may be serialized. In the presence of
sub -group functions, work-items within a sub -group may be serialized
before any given sub -group function, between dynamically encountered
pairs of sub - group functions and between a work-group function and
the end of the kernel.
甚至锁步方式有时也会丢失.
even lockstep manner may be lost at times.
最重要的是
sub_group_all() and
sub_group_broadcast() and are described in OpenCL C++ kernel language and IL specifications.
The use of these sub-group functions implies sequenced-before relationships between statements
within the execution of a single work-item in order to satisfy data dependencies.
说存在某种类型的子组内部通信.因为现在opencl具有子内核定义:
saying that some kind of intra-sub-group communication exists. Because now opencl has child-kernel definition:
Device-side enqueue: A mechanism whereby a kernel-instance is enqueued by a kernel-instance
running on a device without direct involvement by the host program. This produces nested
parallelism; i.e. additional levels of concurrency are nested inside a running kernel-instance.
The kernel-instance executing on a device (the parent kernel) enqueues a kernel-instance (the
child kernel) to a device-side command queue. Child and parent kernels execute asynchronously
though a parent kernel does not complete until all of its child-kernels have completed.
最终,类似
Ultimately, with something like
kernel void launcher()
{
ndrange_t ndrange = ndrange_1D(1);
enqueue_kernel(get_default_queue(), CLK_ENQUEUE_FLAGS_WAIT_KERNEL, ndrange,
^{
size_t id = get_global_id(0);
}
);
}
您应该能够生成所需大小的自己的(已升级?)波前,并且它们与父内核同时工作(并且可以与子组内的线程进行通信),但是由于它们没有经过硬编码,因此它们不被称为波前.通过硬件恕我直言.
you should be able to spawn your own (upgraded?)wavefronts with any size you need and they work concurrently with parent kernel(and can communicate intra-sub-group threads) but they are not called wavefronts because they are not hardcoded by hardware imho.
2.0 api规范说:
2.0 api specs saying:
Extreme care should be exercised when writing code that uses
subgroups if the goal is to write portable OpenCL applications.
让我们想起amd的16宽simds和nvidia的32宽simds与一些虚构的fpga的95宽计算核心.也许是伪波前?
which reminds amd's 16-wide simds and nvidia's 32-wide simds versus some imaginary fpga's 95-wide compute cores. Pseudo-wavefront maybe?
这篇关于官方的OpenCL 2.2标准是否支持WaveFront?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!