CUDA:2D网格中的线程ID分配 [英] CUDA: Thread ID assignment in 2D grid

查看:129
本文介绍了CUDA:2D网格中的线程ID分配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

让我们假设我有一个内核调用与2D网格,像这样:

Let's suppose I have a kernel call with a 2D grid, like so:

dim3 dimGrid(x, y); // not important what the actual values are
dim3 dimBlock(blockSize, blockSize);
myKernel <<< dimGrid, dimBlock >>>();



现在我已经读过多维网格只是为了简化编程 - 底层硬件只会使用1D线性缓存的内存(除非你使用纹理内存,但这里不相关)。

Now I've read that multidimensional grids are merely meant to ease programming - the underlying hardware will only ever use 1D linearly cached memory (unless you use texture memory, but that's not relevant here).

我的问题是:在warp调度中,线程被分配给网格索引的顺序是什么?将它们水平分配(迭代x,然后y)或垂直(迭代y,然后x)?这可能与提高内存合并相关,这取决于我如何访问我的内存中的内存。

My question is: In what order will the threads be assigned to the grid indices during warp scheduling? Will they be assigned horizontally ("iterate" x, then y) or vertically ("iterate" y, then x)? This might be relevant to improve memory coalescing, depending on how I access my memory in the kernel.

为了使它更清楚,让我们假设以下代表线程的ID应用于我的(虚)网格与水平分布:

To make it more clear, let's say the following represents the thread's IDs as applied to my (imaginary) grid with a "horizontal" distribution:

[ 0  1  2  3 ]
[ 4  5  6  7 ]
[ 8  9 10 11 ]
[ ...        ]

而垂直分布将是:

[ 0  4  8 .. ]
[ 1  5  9 .. ]
[ 2  6 10 .. ]
[ 3  7 11 .. ]



我希望你能看到这会如何影响合并:

I hope you can see how this might affect coalescing: With each variant, there will be a specific optimal way to access my device memory buffer.

不幸的是,我还没有找到任何有关此设备的详细信息..

Unfortunately, I have not found any detailed information on this yet..

推荐答案

水平和垂直是任意的。但线程确实有一个明确的x,y和z维度。线程按照x,y,z的顺序分组成warp。因此,16x16线程块在第一个32线程warp中将按以下顺序具有线程:

Horizontal and vertical is arbitrary. But threads do have a well-defined x, y, and z dimension. Threads are grouped into warps in the order of x, y, z. So a 16x16 threadblock will have threads in the following order in the first 32-thread warp:

warp lane:thread ID(x,y,z)

warp lane: thread ID (x,y,z)


  • 0:0,0,0

  • 1:1,0,0

  • 2:2,0,0

  • 3:3,0,0

  • ...

  • 15:15,0,0

  • 16:0,1,0

  • 17:1,1,0

  • 18:2,1,0

  • 19:3,1,0

  • ...

  • 31:15,1,0

  • 0: 0,0,0
  • 1: 1,0,0
  • 2: 2,0,0
  • 3: 3,0,0
  • ...
  • 15: 15,0,0
  • 16: 0,1,0
  • 17: 1,1,0
  • 18: 2,1,0
  • 19: 3,1,0
  • ...
  • 31: 15,1,0

这篇关于CUDA:2D网格中的线程ID分配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆