从非合并访问到合并内存访问CUDA [英] From non coalesced access to coalesced memory access CUDA

查看:139
本文介绍了从非合并访问到合并内存访问CUDA的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道是否有任何简单的方法可以将非分组内存访问转换为合并内存访问.让我们以这个数组为例:

I was wondering if there is any simple way to transform a non-coalesced memory access into a coalesced one. Let's take the example of this array:

dW[[w0,w1,w2][w3,w4,w5][w6,w7][w8,w9]]

现在,我知道,如果块0中的线程0访问 dW [0] ,然后块0中的线程1访问 dw [1] ,那就是合并访问在全局内存中.问题是我有两次手术.如上所述,第一个被合并.但是第二个不是因为块0中的线程1需要对 dW [0] dW [1] dW [2]都进行操作] .

Now, i know that if Thread 0 in block 0 access dW[0] and then Thread 1 in block 0 access dw[1], that's a coalesced access in the global memory. The problem is that i have two operations. The first one is coalesced as described above. But the second one isn't because Thread 1 in block 0 needs to do an operation on both dW[0], dW[1] and dW[2].

我知道容器的初始形状允许或禁止合并的访问.但是 dW 是一个很大的数组,在此过程中我无法对其进行转换.

I know that the initial shape of the container allow or forbid the coalesced access. But dW is a very big array, and i can't transform it during the process.

您知道是否可以缓解此问题?

Do you know if it's possible to aleviate this problem?

推荐答案

您可以尝试使用共享内存,这可能行得通(或者不行,没有示例就很难说出来).

You can try to use shared memory maybe, that might work (or not, hard to tell without an example).

例如,假设第一个操作访问合并的数据,而第二个则大步前进;这可能会加快速度

For instance, say the first operation access coalesced data and the second one strides a lot; this may speedup things

__shared__ int shared[BLOCK_SIZE];
// Load data global -> shared with coalesced access ; you may need to load a bit more before/after depending on you application
shared[tid] = global[some id]
syncthreads();
// Do the math with coalescing access
function0(shared[tid])
// Do the math with the non coalescing access
function1(shared[tid+-1 or wathever])

想法是以合并方式加载共享数据,然后使用共享进行数学运算,因为合并访问与共享内存无关紧要(但是存储区冲突确实可以;尽管这样通常还可以).

The idea is to load data in shared in a coalescent manner, and then use shared to do the math, since coalescent access do not matter with shared memory (but bank conflict do on the other hand ; that's usually fine though).

如果您需要更准确的帮助,则必须向我们提供更多信息.这只是一个提示.

You'll have to give us more information if you want a more accurate help. That's just a hint.

这篇关于从非合并访问到合并内存访问CUDA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆