CUDA中2D共享内存是如何排列的 [英] How is 2D Shared Memory arranged in CUDA

查看:16
本文介绍了CUDA中2D共享内存是如何排列的的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直使用线性共享内存(加载、存储、访问邻居),但我在 2D 中做了一个简单的测试来研究银行冲突,结果让我感到困惑.

I've always worked with linear shared memory (load, store, access neighbours) but I've made a simple test in 2D to study bank conflicts which results have confused me.

接下来的代码从一维全局内存数组读取数据到共享内存,然后从共享内存复制回全局内存.

The next code read data from one dimensional global memory array to shared memory and copy it back from shared memory to global memory.

__global__ void update(int* gIn, int* gOut, int w) {

// shared memory space
__shared__ int shData[16][16];
// map from threadIdx/BlockIdx to data position
int x = threadIdx.x + blockIdx.x * blockDim.x;
int y = threadIdx.y + blockIdx.y * blockDim.y;
// calculate the global id into the one dimensional array
int gid = x + y * w;

// load shared memory
shData[threadIdx.x][threadIdx.y] = gIn[gid];
// synchronize threads not really needed but keep it for convenience
__syncthreads();
// write data back to global memory
gOut[gid] = shData[threadIdx.x][threadIdx.y];
}

视觉分析器报告了共享内存中的冲突.下一段代码避免 thouse 冲突(只显示差异)

The visual profiler reported conflicts in shared memory. The next code avoid thouse conflicts (only show the differences)

// load shared memory
shData[threadIdx.y][threadIdx.x] = gIn[gid];

// write data back to global memory
gOut[gid] = shData[threadIdx.y][threadIdx.x];

这种行为让我感到困惑,因为在 Programming Massively Parallel Processors 中.我们可以阅读的实践方法:

This behavior has confused me because in Programming Massively Parallel Processors. A Hands-on approach we can read:

C 和 CUDA 中的矩阵元素根据行主要约定放置到线性寻址的位置.即矩阵的第 0 行的元素首先按顺序放置到连续的位置.

matrix elements in C and CUDA are placed into the linearly addressed locations according to the row major convention. That is, the elements of row 0 of a matrix are first placed in order into consecutive locations.

这与共享内存排列有关吗?还是使用线程索引?也许我错过了什么?

Is this related to shared memory arragment? or with threads indexes? Maybe am I missing something?

内核配置如下:

// kernel configuration
dim3 dimBlock  = dim3 ( 16, 16, 1 );
dim3 dimGrid   = dim3 ( 64, 64 );
// Launching a grid of 64x64 blocks with 16x16 threads -> 1048576 threads
update<<<dimGrid, dimBlock>>>(d_input, d_output, 1024);

提前致谢.

推荐答案

是的,共享内存按照您的预期按行优先顺序排列.因此,您的 [16][16] 数组按行存储,如下所示:

Yes, shared memory is arranged in row-major order as you expected. So your [16][16] array is stored row wise, something like this:

       bank0 .... bank15
row 0  [ 0   .... 15  ]
    1  [ 16  .... 31  ]
    2  [ 32  .... 47  ]
    3  [ 48  .... 63  ]
    4  [ 64  .... 79  ]
    5  [ 80  .... 95  ]
    6  [ 96  .... 111 ]
    7  [ 112 .... 127 ]
    8  [ 128 .... 143 ]
    9  [ 144 .... 159 ]
    10 [ 160 .... 175 ]
    11 [ 176 .... 191 ]
    12 [ 192 .... 207 ]
    13 [ 208 .... 223 ]
    14 [ 224 .... 239 ]
    15 [ 240 .... 255 ]
       col 0 .... col 15

因为在 Fermi 之前的硬件上有 16 个 32 位共享内存库,所以每一列中的每个整数条目都映射到一个共享内存库.那么这与您选择的索引方案有何相互作用?

Because there are 16 32 bit shared memory banks on pre-Fermi hardware, every integer entry in each column maps onto one shared memory bank. So how does that interact with your choice of indexing scheme?

要记住的是,块中的线程按照列的主要顺序进行编号(从技术上讲,结构的 x 维度变化最快,其次是 y,然后是 z).所以当你使用这个索引方案时:

The thing to keep in mind is that threads within a block are numbered in the equivalent of column major order (technically the x dimension of the structure is the fastest varying, followed by y, followed by z). So when you use this indexing scheme:

shData[threadIdx.x][threadIdx.y]

half-warp 中的线程将从同一列读取,这意味着从同一共享内存库中读取,并且会发生库冲突.当你使用相反的方案时:

threads within a half-warp will be reading from the same column, which implies reading from the same shared memory bank, and bank conflicts will occur. When you use the opposite scheme:

shData[threadIdx.y][threadIdx.x]

同一半扭曲中的线程将从同一行读取,这意味着从 16 个不同的共享内存库中的每一个读取,不会发生冲突.

threads within the same half-warp will be reading from the same row, which implies reading from each of the 16 different shared memory banks, no conflicts occur.

这篇关于CUDA中2D共享内存是如何排列的的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆