为什么Cuda / OpenCL的全局内存中不存在银行冲突? [英] Why aren't there bank conflicts in global memory for Cuda/OpenCL?

查看:531
本文介绍了为什么Cuda / OpenCL的全局内存中不存在银行冲突?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有一件事我没有想出来,谷歌没有帮助我,是为什么有可能与共享内存银行冲突,但不是在全局内存? 看来我只能给一个绿色的复选标记一个答案。我是新的堆栈溢出。我想我必须选择一个答案是最好的。

解决方案

短的

$ b $
b

理解为什么的关键是掌握操作的粒度。单个线程不访问全局内存。全局内存访问是合并的。由于全局内存太慢,所以块中的线程的任何访问都被分组在一起,以尽可能少地向全局内存请求。



可以访问共享内存通过线程同时。当两个线程尝试访问同一个存储区中的地址时,这会导致存储区冲突。



注册表不能被任何线程访问,除了被分配的线程。因为你不能读或写我的寄存器,你不能阻止我访问它们 - 因此,没有任何银行冲突。



谁可以读&写入全局内存?



只阻塞。单线程可以进行访问,但事务将在块级别(实际上是翘曲/半翘曲级别,但我不想复杂)处理。如果两个块访问相同的内存,我不相信它会需要更长的时间,它可能发生加速的L1缓存在最新的设备 - 虽然这不是透明明显。



谁可以阅读&写入共享内存?



给定块中的任何线程每个块你不能有银行冲突,但你不会有合理的表现。存储库冲突发生是因为一个块被分配了几个,比如512个线程,并且他们都在同一个存储体(不完全相同的地址)内争夺不同的地址。在CUDA C编程指南 - 图G2,第167页(实际上是pdf的第177页)的末尾有一些优秀的冲突图片。 链接到版本3.2



谁可以阅读&写入寄存器?



只有分配给它的特定线程正在一次访问它。


One thing I haven't figured out and google isn't helping me, is why is it possible to have bank conflicts with shared memory, but not in global memory? Can there be bank conflicts with registers?

UPDATE Wow I really appreciate the two answers from Tibbit and Grizzly. It seems that I can only give a green check mark to one answer though. I am newish to stack overflow. I guess I have to pick one answer as the best. Can I do something to say thank you to the answer I don't give a green check to?

解决方案

Short Answer: There are no bank conflicts in either global memory or in registers.

Explanation:

The key to understanding why is to grasp the granularity of the operations. A single thread does not access the global memory. Global memory accesses are "coalesced". Since global memory is soo slow, any access by the threads within a block are grouped together to make as few requests to the global memory as possible.

Shared memory can be accessed by threads simultaneously. When two threads attempt to access an address within the same bank, this causes a bank conflict.

Registers cannot be accessed by any thread except the one to which it is allocated. Since you can't read or write to my registers, you can't block me from accessing them -- hence, there aren't any bank conflicts.

Who can read & write to global memory?

Only blocks. A single thread can make an access, but the transaction will be processed at the block level (actually the warp / half warp level, but I'm trying not be complicated). If two blocks access the same memory, I don't believe it will take longer and it may happen accelerated by the L1 cache in the newest devices -- though this isn't transparently evident.

Who can read & write to shared memory?

Any thread within a given block. If you only have 1 thread per block you can't have a bank conflict, but you won't have reasonable performance. Bank conflicts occur because a block is allocated with several, say 512 threads and they're all vying for different addresses within the same bank (not quite the same address). There are some excellent pictures of these conflicts at the end of the CUDA C Programming Guide -- Figure G2, on page 167 (actually page 177 of the pdf). Link to version 3.2

Who can read & write to registers?

Only the specific thread to which it is allocated. Hence only one thread is accessing it at one time.

这篇关于为什么Cuda / OpenCL的全局内存中不存在银行冲突?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆