如何在CUDA / cublas中转置矩阵? [英] How to transpose a matrix in CUDA/cublas?
问题描述
假设我在GPU上有一个尺寸为 A * B
的矩阵,其中 B
)是假定C风格的前导维度。在CUDA(或cublas)中有没有任何方法将此矩阵转置为FORTRAN风格,其中 A
(行数)成为领先维度?
Say I have a matrix with a dimension of A*B
on GPU, where B
(number of columns) is the leading dimension assuming a C style. Is there any method in CUDA (or cublas) to transpose this matrix to FORTRAN style, where A
(number of rows) becomes the leading dimension?
如果在 host-> device
传输期间可以转置,但保持原始数据不变。
It is even better if it could be transposed during host->device
transfer while keep the original data unchanged.
推荐答案
CUDA SDK包含矩阵转置,您可以看到
The CUDA SDK includes a matrix transpose, you can see here examples of code on how to implement one, ranging from a naive implementation to optimized versions.
例如:
Naïvetranspose
__global__ void transposeNaive(float *odata, float* idata,
int width, int height, int nreps)
{
int xIndex = blockIdx.x*TILE_DIM + threadIdx.x;
int yIndex = blockIdx.y*TILE_DIM + threadIdx.y;
int index_in = xIndex + width * yIndex;
int index_out = yIndex + height * xIndex;
for (int r=0; r < nreps; r++)
{
for (int i=0; i<TILE_DIM; i+=BLOCK_ROWS)
{
odata[index_out+i] = idata[index_in+i*width];
}
}
}
像talonmies可以指定是否希望在cublas矩阵运算中将矩阵作为转置操作例如:对于cublasDgemm()其中C = a * op(A)* op(B)+ b * C,假设您要操作A转置(A ^ T),对于您可以指定的参数('N'正常或'T'转置)
Like talonmies had point out you can specify if you want operate the matrix as transposed or not, in cublas matrix operations eg.: for cublasDgemm() where C = a * op(A) * op(B) + b * C, assuming you want to operate A as transposed (A^T), on the parameters you can specify if it is ('N' normal or 'T' transposed)
这篇关于如何在CUDA / cublas中转置矩阵?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!