GPU共享内存库冲突 [英] GPU Shared Memory Bank Conflict

查看:265
本文介绍了GPU共享内存库冲突的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想了解银行冲突的发生情况。

如果我在全局内存中有一个大小为256的数组,并且我在一个块中有256个线程,并且我想将数组复制到共享内存。因此每个线程复制一个元素。

  shared_a [threadIdx.x] = global_a [threadIdx.x] 
pre>

这个简单的操作会导致银行冲突吗?



现在假设数组的大小大于线程数,所以我现在使用这个来将全局内存复制到共享内存:

  tid = threadIdx.x; 
for(int i = 0; tid + i shared_a [tid + i] = global_a [tid + i]

上述代码是否导致银行冲突?

解决方案

检查这个的最好方法是使用Compute Visual Profiler这个自带的CUDA工具包。此外, GPU Gems 3 中还有一个很棒的部分 - 39.2.3避免当同一个warp中的多个线程访问同一个bank时,发生bank冲突,除非warp的所有线程访问同一个bank中的相同地址。相同的32位字 - 第一件事有16个存储器组,每个4字节宽。因此,实质上,如果您在共享内存库中的同一个4bytes的内存中有任何线程在一半的读取内存,您将有银行冲突和序列化等。



确定,以便您的第一个示例



首先让我们假设你的数组例如类型 int 一个32位字)。你的代码将这些int保存到共享内存中,跨越第K个线程保存到第K个内存条的任何一半。因此,例如第一半线程的线程0将保存到在第一存储器组中的 shared_a [0] ,线程1将保存到 shared_a [1] ,每半个线程有16个线程,这些映射到16个4byte的bank。在下一半弯曲中,第一个线程现在将其值保存到再次在第一存储体中的shared_a [16]中。所以如果你使用一个4byte的字,如int,float等,那么你的第一个例子不会导致银行冲突。如果你使用一个1字节的字,如char,在第一半的经线0,1,2和3将全部保存他们的值到共享内存的第一个银行,将导致银行冲突。



第二个例子



这一切都取决于你使用的单词的大小,我将使用一个4字节的例子。所以看看前半段:



线程数= 32



N = 64



线程0:将写入0,31,63
线程1:将写入1,32



跨越一半warp的所有线程同时执行,因此对共享内存的写入不应该引起bank冲突。



希望这有帮助,对于巨大的回复,抱歉。


I am trying to understand how bank conflicts take place.
if i have an array of size 256 in global memory and i have 256 threads in a single Block, and i want to copy the array to shared memory. therefore every thread copies one element.

shared_a[threadIdx.x]=global_a[threadIdx.x]

does this simple action result in a bank conflict?

suppose now that the size of the array is larger than the number of threads, so i am now using this to copy the global memory to the shared memory:

tid = threadIdx.x;
for(int i=0;tid+i<N;i+=blockDim.x)
     shared_a[tid+i]=global_a[tid+i];

does the above code result in a bank conflict?

解决方案

The best way to check this would be to profile your code using the "Compute Visual Profiler"; this comes with the CUDA Toolkit. Also there's a great section in GPU Gems 3 on this - "39.2.3 Avoiding Bank Conflicts".

"When multiple threads in the same warp access the same bank, a bank conflict occurs unless all threads of the warp access the same address within the same 32-bit word" - First thing there are 16 memory banks each 4bytes wide. So essentially, if you have any thread in a half warp reading memory from the same 4bytes in a shared memory bank, you're going to have bank conflicts and serialization etc.

OK so your first example:

First lets assume your arrays are say for example of the type int (a 32-bit word). Your code saves these ints into shared memory, across any half warp the Kth thread is saving to the Kth memory bank. So for example thread 0 of the first half warp will save to shared_a[0] which is in the first memory bank, thread 1 will save to shared_a[1], each half warp has 16 threads these map to the 16 4byte banks. In the next half warp, the first thread will now save its value into shared_a[16] which is in the first memory bank again. So if you use a 4byte word such int, float etc then your first example will not result in a bank conflict. If you use a 1 byte word such as char, in the first half warp threads 0, 1, 2 and 3 will all save their values to the first bank of shared memory which will cause a bank conflict.

Second example:

Again this will all depend on the size of the word you are using, but for the example I'll use a 4byte word. So looking at the first half warp:

Number of threads = 32

N = 64

Thread 0: Will write to 0, 31, 63 Thread 1: Will write to 1, 32

All threads across the half warp execute concurrently so the writes to shared memory shouldn't cause bank conflicts. I'll have to double check this one though.

Hope this helps, sorry for the huge reply!

这篇关于GPU共享内存库冲突的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆