在CUDA中并行化,为每列分配线程 [英] Parallelization in CUDA, assigning threads to each column
问题描述
假设我有一个从MxN 2D矩阵转换的1D数组,我想并行化每个列,并做一些操作。
Say I have a 1D array converted from a MxN 2D matrix, and I want to parallelize each column and do some operations. How do I assign a thread to each column?
例如,如果我有一个3x3的矩阵:
For example, if I have a 3x3 matrix:
1 2 3
4 5 6
7 8 9
而且我想根据列#添加列中的每个数字(因此第一列将添加1,第二列将添加2 ....),然后变为:
And I want to add each number in the column depending on the column # (so 1st column will add 1, 2nd will add 2....), it then becomes:
1+1 2+1 3+1
4+2 5+2 6+2
7+3 8+3 9+3
这在CUDA?我知道如何分配线程到数组中的所有元素,但我不知道如何分配线程到每一列。所以,我想要的是发送每一列(1,2,3)(4,5,6)(7,8,9)并执行操作。
How do I do this in CUDA? I know how to assign threads to all the elements in the array but I don't know how to assign thread to each column. So, what I want is to send each column (1 , 2 ,3 ) ( 4 , 5 ,6 ) (7 , 8 ,9) and do the operation.
推荐答案
在您的示例中,您将添加基于行的数字。不过,你知道矩阵的行/列长度(你知道它是MxN)。你可以做的是:
In your example you are adding numbers based on the row. Still, you know the row/column length of the matrix (you know it's MxN). What you could do is something like:
__global__ void MyAddingKernel(int* matrix, int M, int N)
{
int gid = threadIdx.x + blockDim.x*blockIdx.x;
//Let's add the row number to each element
matrix[ gid ] += gid % M;
//Let's add the column number to each element
matrix[ gid ] += gid % N;
}
如果您想添加其他号码,例如:
If you wanted to add a different number, you could do something like:
matrix[ gid ] += my_col_number_function(gid%N);
这篇关于在CUDA中并行化,为每列分配线程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!