是否以相反的顺序访问内存而触发合并? [英] Is coalescing triggered for accessing memory in reverse order?

查看:158
本文介绍了是否以相反的顺序访问内存而触发合并?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有几个线程,他们访问内存地址A + 0,A + 4,A + 8,A + 12(每次访问=下一个线程)。这种访问是合并的,是吗?

Let's say I have several threads and they access memory at addresses A+0, A+4, A+8, A+12 (each access = next thread). Such access is coalesced, right?

但是,如果我访问相同的内存但是以相反的顺序,意思是:

However if I have access the same memory but in reverse order, meaning:

thread 0 -> A+12
thread 1 -> A+8
thread 2 -> A+4
thread 3 -> A+0

是否也会触发合并?

推荐答案

是的,对于cc 2.0和更新的GPU,只要所有请求的32位数据元素到来,就会发生32位数据元素随机排列到线程的合并

Yes, for cc 2.0 and newer GPUs, coalescing will occur for any random arrangement of 32 bit data elements to threads, as long as all the requested 32-bit data elements are coming from (requested from) the same 128 byte (and 128 byte aligned) region in global memory.

GPU在存储器控制器中具有类似于纵横开关的东西(从请求的)全局存储器中相同的128字节将根据需要分发元素。您可能对此GPU在线讲座感兴趣,该讨论会将讨论合并,并将说明此(在幻灯片12上)。

The GPU has something like a "crossbar switch" in the memory controller that will distribute elements as needed. You may be interested in this GPU webinar which discusses coalescing and will illustrate this particular case pictorially (on slide 12).

NVIDIA在线讲座页面还提供了您可能感兴趣的其他有用的在线讲座。

The NVIDIA webinar page has other useful webinars you may be interested in as well.

对于pre-cc2.0设备具体情况因计算能力而异,但计算1.0和1.1的设备没有这种能力,以反向顺序或随机顺序合并读取。

For pre-cc2.0 devices the specifics vary by compute capability, but compute 1.0 and 1.1 capable devices do not have this ability to coalesce reads that are in "reverse order" or random order.

这篇关于是否以相反的顺序访问内存而触发合并?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆