在CUDA中并行化,为每列分配线程 [英] Parallelization in CUDA, assigning threads to each column

查看:451
本文介绍了在CUDA中并行化,为每列分配线程的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个从MxN 2D矩阵转换的1D数组,我想并行化每个列,并做一些操作。

Say I have a 1D array converted from a MxN 2D matrix, and I want to parallelize each column and do some operations. How do I assign a thread to each column?

例如,如果我有一个3x3的矩阵:

For example, if I have a 3x3 matrix:

1  2  3

4  5  6

7  8  9

而且我想根据列#添加列中的每个数字(因此第一列将添加1,第二列将添加2 ....),然后变为:

And I want to add each number in the column depending on the column # (so 1st column will add 1, 2nd will add 2....), it then becomes:

1+1   2+1   3+1

4+2   5+2   6+2

7+3   8+3   9+3

这在CUDA?我知道如何分配线程到数组中的所有元素,但我不知道如何分配线程到每一列。所以,我想要的是发送每一列(1,2,3)(4,5,6)(7,8,9)并执行操作。

How do I do this in CUDA? I know how to assign threads to all the elements in the array but I don't know how to assign thread to each column. So, what I want is to send each column (1 , 2 ,3 ) ( 4 , 5 ,6 ) (7 , 8 ,9) and do the operation.

推荐答案

在您的示例中,您将添加基于行的数字。不过,你知道矩阵的行/列长度(你知道它是MxN)。你可以做的是:

In your example you are adding numbers based on the row. Still, you know the row/column length of the matrix (you know it's MxN). What you could do is something like:

__global__ void MyAddingKernel(int* matrix, int M, int N)
{

    int gid = threadIdx.x + blockDim.x*blockIdx.x;
    //Let's add the row number to each element
    matrix[ gid ] += gid % M;
    //Let's add the column number to each element
    matrix[ gid ] += gid % N;

}

如果您想添加其他号码,例如:

If you wanted to add a different number, you could do something like:

matrix[ gid ] += my_col_number_function(gid%N);

这篇关于在CUDA中并行化,为每列分配线程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆