OpenMP并行化(块矩阵多) [英] OpenMP parallelization (Block Matrix Mult)

查看:84
本文介绍了OpenMP并行化(块矩阵多)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试实现块矩阵乘法并使它更加并行化.

I'm attempting to implement block matrix multiplication and making it more parallelized.

这是我的代码:

int i,j,jj,k,kk;
float sum;
int en = 4 * (2048/4);
    #pragma omp parallel for collapse(2) 
for(i=0;i<2048;i++) {
    for(j=0;j<2048;j++) {
        C[i][j]=0;
    }
}
for (kk=0;kk<en;kk+=4) {
    for(jj=0;jj<en;jj+=4) {
        for(i=0;i<2048;i++) {
            for(j=jj;j<jj+4;j++) {
                sum = C[i][j];
                for(k=kk;k<kk+4;k++) {
                    sum+=A[i][k]*B[k][j];
                }
                C[i][j] = sum;
            }
        }
    }
}

我一直在使用OpenMP,但是仍然无法确定在最短的时间内完成此操作的最佳方法.

I've been playing around with OpenMP but still have had no luck in figuring what the best way to have this done in the least amount of time.

推荐答案

通过矩阵乘法获得良好的性能是一项艰巨的任务.由于最好的代码是我不必编写的代码",因此,更好地利用您的时间就是了解如何使用BLAS库.

Getting good performance from matrix multiplication is a big job. Since "The best code is the code I don't have to write", a much better use of your time would be to understand how to use a BLAS library.

如果您使用的是X86处理器,则可免费获得英特尔数学内核库(MKL),其中包括优化的并行矩阵乘法运算. https://software.intel.com/zh-cn/articles/free- mkl

If you are using X86 processors, the Intel Math Kernel Library (MKL) is available free, and includes optimized, parallelized, matrix multiplication operations. https://software.intel.com/en-us/articles/free-mkl

(FWIW,我为Intel工作,但不在MKL上工作:-))

(FWIW, I work for Intel, but not on MKL :-))

这篇关于OpenMP并行化(块矩阵多)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆