如何在 CUDA/cublas 中转置矩阵? [英] How to transpose a matrix in CUDA/cublas?

查看:113
本文介绍了如何在 CUDA/cublas 中转置矩阵?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我在 GPU 上有一个维度为 A*B 的矩阵,其中 B(列数)是假设 C 风格的主要维度.CUDA(或cublas)中是否有任何方法可以将此矩阵转置为FORTRAN样式,其中A(行数)成为主要维度?

Say I have a matrix with a dimension of A*B on GPU, where B (number of columns) is the leading dimension assuming a C style. Is there any method in CUDA (or cublas) to transpose this matrix to FORTRAN style, where A (number of rows) becomes the leading dimension?

如果能在host->device传输过程中进行转置就更好了,同时保持原始数据不变.

It is even better if it could be transposed during host->device transfer while keep the original data unchanged.

推荐答案

CUDA SDK 包括一个 矩阵转置,你可以看到这里的例子关于如何实现的代码,从简单的实现到优化的版本.

The CUDA SDK includes a matrix transpose, you can see here examples of code on how to implement one, ranging from a naive implementation to optimized versions.

例如:

朴素转置

__global__ void transposeNaive(float *odata, float* idata,
int width, int height, int nreps)
{
    int xIndex = blockIdx.x*TILE_DIM + threadIdx.x;
    int yIndex = blockIdx.y*TILE_DIM + threadIdx.y;
    int index_in = xIndex + width * yIndex;
    int index_out = yIndex + height * xIndex;

    for (int r=0; r < nreps; r++)
    {
        for (int i=0; i<TILE_DIM; i+=BLOCK_ROWS)
        {
          odata[index_out+i] = idata[index_in+i*width];
        }
    }
}

就像 talonmies 指出的那样,您可以在 cublas 矩阵运算中指定是否要将矩阵作为转置运算,例如:对于 cublasDgemm() 其中 C = a * op(A) * op(B) + b *C,假设你想操作A为转置(A^T),在参数上你可以指定它是('N' normal or 'T' transposed)

Like talonmies had point out you can specify if you want operate the matrix as transposed or not, in cublas matrix operations eg.: for cublasDgemm() where C = a * op(A) * op(B) + b * C, assuming you want to operate A as transposed (A^T), on the parameters you can specify if it is ('N' normal or 'T' transposed)

这篇关于如何在 CUDA/cublas 中转置矩阵?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆