什么是银行冲突? (Doing Cuda / OpenCL编程) [英] What is a bank conflict? (Doing Cuda/OpenCL programming)

查看:260
本文介绍了什么是银行冲突? (Doing Cuda / OpenCL编程)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在阅读CUDA和OpenCL的编程指南,我不知道银行冲突是什么。他们只是一些潜在的如何解决问题,没有详细阐述的主题本身。有谁能帮我理解吗?我没有偏好,如果帮助是在CUDA / OpenCL的上下文或一般在计算机科学中的银行冲突。

I have been reading the programming guide for CUDA and OpenCL, and I cannot figure out what a bank conflict is. They just sort of dive into how to solve the problem without elaborating on the subject itself. Can anybody help me understand it? I have no preference if the help is in the context of CUDA/OpenCL or just bank conflicts in general in computer science.

推荐答案

对于nvidia(和amd的事情)gpus本地内存分为内存库。每个库只能一次寻址一个数据集,因此如果halfwarp尝试从/向同一个bank加载/存储数据,则必须对访问进行序列化(这是一个bank冲突)。对于gt200 gpus,有16个bank(用于fermi的32bank),用于AMD gpus的16或32个bank(57xx或更高:32,以下全部:16)),其与32位的颗粒交织(因此字节0-3在存储体1,4-7在存储体2中,...,64-69在存储体1中,等等)。为了更好的可视化,它基本上看起来像这样:

For nvidia (and amd for that matter) gpus the local memory is divided into memorybanks. Each bank can only address one dataset at a time, so if a halfwarp tries to load/store data from/to the same bank the access has to be serialized (this is a bank conflict). For gt200 gpus there are 16 banks (32banks for fermi), 16 or 32 banks for AMD gpus (57xx or higher: 32, everything below: 16)), which are interleaved with a granuity of 32bit (so byte 0-3 are in bank 1, 4-7 in bank 2, ..., 64-69 in bank 1 and so on). For a better visualization it basically looks like this:

Bank    |      1      |      2      |      3      |...
Address |  0  1  2  3 |  4  5  6  7 |  8  9 10 11 |...
Address | 64 65 66 67 | 68 69 70 71 | 72 73 74 75 |...
...

halfwarp访问连续的32位值没有bank冲突。从这个规则(每个线程必须访问它自己的银行)的一个例外是广播:
如果所有线程访问相同的地址,该值只读一次并广播到所有线程(对于GT200它必须是所有线程halfwarp访问相同的地址,iirc fermi和AMD gpus可以为任意数量的线程访问相同的值)。

So if each thread in a halfwarp accesses successive 32bit values there are no bank conflicts. An exception from this rule (every thread must access its own bank) are broadcasts: If all threads access the same address, the value is only read once and broadcasted to all threads (for GT200 it has to be all threads in the halfwarp accessing the same address, iirc fermi and AMD gpus can do this for any number of threads accessing the same value).

这篇关于什么是银行冲突? (Doing Cuda / OpenCL编程)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆