什么是银行冲突?(做 Cuda/OpenCL 编程) [英] What is a bank conflict? (Doing Cuda/OpenCL programming)

查看:24
本文介绍了什么是银行冲突?(做 Cuda/OpenCL 编程)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在阅读 CUDA 和 OpenCL 的编程指南,但我无法弄清楚银行冲突是什么.他们只是深入研究如何解决问题,而没有详细说明主题本身.有人可以帮我理解吗?如果帮助是在 CUDA/OpenCL 的上下文中,或者只是计算机科学中一般的银行冲突,我没有偏好.

I have been reading the programming guide for CUDA and OpenCL, and I cannot figure out what a bank conflict is. They just sort of dive into how to solve the problem without elaborating on the subject itself. Can anybody help me understand it? I have no preference if the help is in the context of CUDA/OpenCL or just bank conflicts in general in computer science.

推荐答案

对于 nvidia(和 AMD)gpus,本地内存分为内存库.每个 bank 一次只能寻址一个数据集,因此如果 halfwarp 尝试从同一 bank 加载/存储数据,则必须对访问进行序列化(这是 bank 冲突).对于 gt200 gpus,有 16 个库(对于 fermi 有 32 个库),对于 AMD gpus 有 16 或 32 个库(57xx 或更高:32,以下所有内容:16)),它们以 32 位的粒度交错(因此字节 0-3 在银行 1、银行 2 中的 4-7、...、银行 1 中的 64-69 等等).为了更好的可视化,它基本上看起来像这样:

For nvidia (and amd for that matter) gpus the local memory is divided into memorybanks. Each bank can only address one dataset at a time, so if a halfwarp tries to load/store data from/to the same bank the access has to be serialized (this is a bank conflict). For gt200 gpus there are 16 banks (32banks for fermi), 16 or 32 banks for AMD gpus (57xx or higher: 32, everything below: 16)), which are interleaved with a granuity of 32bit (so byte 0-3 are in bank 1, 4-7 in bank 2, ..., 64-69 in bank 1 and so on). For a better visualization it basically looks like this:

Bank    |      1      |      2      |      3      |...
Address |  0  1  2  3 |  4  5  6  7 |  8  9 10 11 |...
Address | 64 65 66 67 | 68 69 70 71 | 72 73 74 75 |...
...

因此,如果 halfwarp 中的每个线程都访问连续的 32 位值,则不会发生存储库冲突.该规则的一个例外(每个线程必须访问自己的银行)是广播:如果所有线程都访问同一个地址,则该值只读取一次并广播给所有线程(对于 GT200,它必须是 halfwarp 中的所有线程都访问同一个地址,iirc fermi 和 AMD gpus 可以对任意数量的线程访问执行此操作相同的值).

So if each thread in a halfwarp accesses successive 32bit values there are no bank conflicts. An exception from this rule (every thread must access its own bank) are broadcasts: If all threads access the same address, the value is only read once and broadcasted to all threads (for GT200 it has to be all threads in the halfwarp accessing the same address, iirc fermi and AMD gpus can do this for any number of threads accessing the same value).

这篇关于什么是银行冲突?(做 Cuda/OpenCL 编程)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆