CUDA中的3D元素矩阵乘法? [英] 3D Elementwise Matrix Multiplication in CUDA?

查看:156
本文介绍了CUDA中的3D元素矩阵乘法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个使用以下内核的2D矩阵乘法程序:

I have a 2D Matrix Multiplication program using the following kernel:

__global__ void multKernel(int *a, int *b, int *c, int N)
{
    int column  = threadIdx.x + blockDim.x * blockIdx.x;
    int row     = threadIdx.y + blockDim.y * blockIdx.y;

    int index = row * N + column;

    if(column < N && row < N)
    {
        c[index] = a[index] * b[index];
    }
}

现在,我想创建一个3D矩阵乘法内核,但是我很难找到创建一个3D矩阵的示例(此外,我在阅读数学公式时也很糟糕,这是我需要改进的地方)

Now, I'd like to create a 3D matrix multiplication kernel, but I'm having trouble finding examples of how do create one (also, I'm terrible at reading mathematical formulae, it's something I need to improve on).

我知道GPU示例将涉及使用

I know the GPU example will involve using

threadIdx.z

依此类推,但是我对如何执行操作有些迷惑.

and so on, but I'm a bit lost with how to do it.

有人可以向我指出一些公式或示例代码的正确方向吗?甚至提供一个基本的例子?我认为我有一个应该应该工作的CPU示例.

Could anyone point me in the right direction to either some formulae or sample code? Or even provide a basic example? I have a CPU example which should work, I think.

void matrixMult3D(int *a, int *b, int *c, int *z, int N)
{
    int index;

    for(int column = 0; column < N; column++)
    {
        for(int row = 0; row < N; row++)
        {
            for (int z = 0; z < N; z++)
            {
                index = row * N + column + z;
                c[index] = a[index] * b[index] * z[index];
            }
        }
    }
}

我至少在正确的轨道上吗?

Am I at least on the right track?

推荐答案

因为您实际所做的只是基于元素的产品(我不愿意将其称为

Because what you are actually doing is just an element-wise product (I hesitate to call it a Hadamard Product because that isn't defined for hyper matrices AFAIK), you don't need to do anything differently from the simplest 1D version of your kernel code. Something like this:

template<int ndim>
__global__ void multKernel(int *a, int *b, int *c, int *z, int N)
{
    int idx  = threadIdx.x + blockDim.x * blockIdx.x;
    int stride = blockDim.x * gridDim.x;

    int idxmax = 1;
    #pragma unroll
    for(int i=0; i < ndim; i++) {
        idxmax *= N;
    }
    for(; idx < idxmax; idx+=stride) {
       c[index] = a[index] * b[index] * z[index];
    }
}

[免责声明:用浏览器编写的代码,请勿编译或运行.使用风险自负]

[disclaimer: code written in browser, never compiled or run. use at own risk]

适用于尺寸为N(ndim = 1),N * N(ndim = 2),N * N * N(ndim = 3)等的任何尺寸的数组.

would work for any dimension of array with dimensions N (ndim=1), N*N (ndim=2), N*N*N (ndim=3), etc.

这篇关于CUDA中的3D元素矩阵乘法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆