CUDA中的3D元素矩阵乘法? [英] 3D Elementwise Matrix Multiplication in CUDA?
问题描述
我有一个使用以下内核的2D矩阵乘法程序:
I have a 2D Matrix Multiplication program using the following kernel:
__global__ void multKernel(int *a, int *b, int *c, int N)
{
int column = threadIdx.x + blockDim.x * blockIdx.x;
int row = threadIdx.y + blockDim.y * blockIdx.y;
int index = row * N + column;
if(column < N && row < N)
{
c[index] = a[index] * b[index];
}
}
现在,我想创建一个3D矩阵乘法内核,但是我很难找到创建一个3D矩阵的示例(此外,我在阅读数学公式时也很糟糕,这是我需要改进的地方)
Now, I'd like to create a 3D matrix multiplication kernel, but I'm having trouble finding examples of how do create one (also, I'm terrible at reading mathematical formulae, it's something I need to improve on).
我知道GPU示例将涉及使用
I know the GPU example will involve using
threadIdx.z
依此类推,但是我对如何执行操作有些迷惑.
and so on, but I'm a bit lost with how to do it.
有人可以向我指出一些公式或示例代码的正确方向吗?甚至提供一个基本的例子?我认为我有一个应该应该工作的CPU示例.
Could anyone point me in the right direction to either some formulae or sample code? Or even provide a basic example? I have a CPU example which should work, I think.
void matrixMult3D(int *a, int *b, int *c, int *z, int N)
{
int index;
for(int column = 0; column < N; column++)
{
for(int row = 0; row < N; row++)
{
for (int z = 0; z < N; z++)
{
index = row * N + column + z;
c[index] = a[index] * b[index] * z[index];
}
}
}
}
我至少在正确的轨道上吗?
Am I at least on the right track?
推荐答案
Because what you are actually doing is just an element-wise product (I hesitate to call it a Hadamard Product because that isn't defined for hyper matrices AFAIK), you don't need to do anything differently from the simplest 1D version of your kernel code. Something like this:
template<int ndim>
__global__ void multKernel(int *a, int *b, int *c, int *z, int N)
{
int idx = threadIdx.x + blockDim.x * blockIdx.x;
int stride = blockDim.x * gridDim.x;
int idxmax = 1;
#pragma unroll
for(int i=0; i < ndim; i++) {
idxmax *= N;
}
for(; idx < idxmax; idx+=stride) {
c[index] = a[index] * b[index] * z[index];
}
}
[免责声明:用浏览器编写的代码,请勿编译或运行.使用风险自负]
[disclaimer: code written in browser, never compiled or run. use at own risk]
适用于尺寸为N(ndim = 1),N * N(ndim = 2),N * N * N(ndim = 3)等的任何尺寸的数组.
would work for any dimension of array with dimensions N (ndim=1), N*N (ndim=2), N*N*N (ndim=3), etc.
这篇关于CUDA中的3D元素矩阵乘法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!