OpenMP并行化(块矩阵多) [英] OpenMP parallelization (Block Matrix Mult)
问题描述
我正在尝试实现块矩阵乘法并使它更加并行化.
I'm attempting to implement block matrix multiplication and making it more parallelized.
这是我的代码:
int i,j,jj,k,kk;
float sum;
int en = 4 * (2048/4);
#pragma omp parallel for collapse(2)
for(i=0;i<2048;i++) {
for(j=0;j<2048;j++) {
C[i][j]=0;
}
}
for (kk=0;kk<en;kk+=4) {
for(jj=0;jj<en;jj+=4) {
for(i=0;i<2048;i++) {
for(j=jj;j<jj+4;j++) {
sum = C[i][j];
for(k=kk;k<kk+4;k++) {
sum+=A[i][k]*B[k][j];
}
C[i][j] = sum;
}
}
}
}
我一直在使用OpenMP,但是仍然无法确定在最短的时间内完成此操作的最佳方法.
I've been playing around with OpenMP but still have had no luck in figuring what the best way to have this done in the least amount of time.
推荐答案
通过矩阵乘法获得良好的性能是一项艰巨的任务.由于最好的代码是我不必编写的代码",因此,更好地利用您的时间就是了解如何使用BLAS库.
Getting good performance from matrix multiplication is a big job. Since "The best code is the code I don't have to write", a much better use of your time would be to understand how to use a BLAS library.
如果您使用的是X86处理器,则可免费获得英特尔数学内核库(MKL),其中包括优化的并行矩阵乘法运算. https://software.intel.com/zh-cn/articles/free- mkl
If you are using X86 processors, the Intel Math Kernel Library (MKL) is available free, and includes optimized, parallelized, matrix multiplication operations. https://software.intel.com/en-us/articles/free-mkl
(FWIW,我为Intel工作,但不在MKL上工作:-))
(FWIW, I work for Intel, but not on MKL :-))
这篇关于OpenMP并行化(块矩阵多)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!