阵列结构VS在CUDA结构数组 [英] Structure of Arrays vs Array of Structures in cuda

查看:153
本文介绍了阵列结构VS在CUDA结构数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从我这里看了一些评论,由于某种原因,它是preferable有阵列(SOA)以上(AOS)的结构像CUDA并行实现?如果这是真的谁都可以解释,为什么?
在此先感谢!

From some comments that I have read in here, for some reason it is preferable to have Structure of Arrays (SoA) over (AoS) for parallel implementations like cuda? If that is true can anyone explain why? Thanks in advance!

推荐答案

与SOA AOS的选择以获得最佳性能通常取决于访问模式。这并不仅仅局限于但是CUDA - 类似的考虑也适用于任何架构,性能,可显著受内存访问模式,例如在这里你有缓存或者性能与连续的内存访问(例如合并的存储器访问CUDA)。

Choice of AoS versus SoA for optimum performance usually depends on access pattern. This is not just limited to CUDA however - similar considerations apply for any architecture where performance can be significantly affected by memory access pattern, e.g. where you have caches or where performance is better with contiguous memory access (e.g. coalesced memory accesses in CUDA).

例如。对于RGB像素相对独立的RGB平面:

E.g. for RGB pixels versus separate RGB planes:

struct {
    uint8_t r, g, b;
} AoS[N];

struct {
    uint8_t r[N];
    uint8_t g[N];
    uint8_t b[N];
} SoA;

如果你将要访问的每个像素的R / G / B组分同时再AOS通常是有道理的,因为连续读取的R,G,B分量将是连续的,通常包含相同的缓存行中。对于CUDA这也意味着内存读/写凝聚。

If you are going to be accessing the R/G/B components of each pixel concurrently then AoS usually makes sense, since the successive reads of R, G, B components will be contiguous and usually contained within the same cache line. For CUDA this also means memory read/write coalescing.

不过,如果你要处理的彩色平面分开那么SOA可能是preferred,例如如果你想通过一些比例系数扩展所有的R值,那么SOA意味着所有的R组件将是连续的。

However if you are going to process color planes separately then SoA might be preferred, e.g. if you want to scale all R values by some scale factor, then SoA means that all R components will be contiguous.

一个进一步的考虑是填充/对齐。用于在AOS布局中的每个元件上述RGB示例对准的3个字节的倍数,其可以不便于CUDA,SIMD,等 - 在某些情况下甚至可能需要该结构内填充以使对准更方便(如添加一个虚拟uint8_t有元素,以确保4字节对齐)。在SOA的情况下,不过平面是字节对齐的这对于某些算法/架构更加方便。

One further consideration is padding/alignment. For the RGB example above each element in an AoS layout is aligned to a multiple of 3 bytes, which may not be convenient for CUDA, SIMD, et al - in some cases perhaps even requiring padding within the struct to make alignment more convenient (e.g. add a dummy uint8_t element to ensure 4 byte alignment). In the SoA case however the planes are byte aligned which can be more convenient for certain algorithms/architectures.

有关大多数图象处理类型的应用程序的AOS方案是更常见的,但对于其他的应用,或者用于特定的图像处理任务,这可能不总是这样的情况。如果没有明显的选择,我会建议AOS作为默认选择。

For most image processing type applications the AoS scenario is much more common, but for other applications, or for specific image processing tasks this may not always be the case. When there is no obvious choice I would recommend AoS as the default choice.

另请参见为AOS v SOA的更广泛的讨论这个答案

这篇关于阵列结构VS在CUDA结构数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆