并行矩阵乘法 [英] Parallelized Matrix Multiplication

查看:96
本文介绍了并行矩阵乘法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试并行化两个矩阵 A B 的乘法.

I am trying to parallelize the multiplication of two matrix A,B.

不幸的是,串行实现仍然比并行实现快,或者加速太慢.(矩阵尺寸= 512时,加速效果类似于 1.3 ).可能根本上是错的.外面有人可以给我小费吗?

Unfortunately the serial implementation is still faster than the parallel one or the speedup is too low. (with matrix dimension = 512 the speedup is like 1.3). Probably something is fundamentally wrong. Can someone out there give me a tip?

double[][] matParallel2(final double[][] matrixA,
                        final double[][] matrixB,
                        final boolean parallel) {
    int rows = matrixA.length;
    int columnsA = matrixA[0].length;
    int columnsB = matrixB[0].length;

    Runnable task;
    List<Thread> pool = new ArrayList<>();

    double[][] returnMatrix = new double[rows][columnsB];
    for (int i = 0; i < rows; i++) {
        int finalI = i;
        task = () -> {
            for (int j = 0; j < columnsB; j++) {
                //  returnMatrix[finalI][j] = 0;
                for (int k = 0; k < columnsA; k++) {
                    returnMatrix[finalI][j] +=
                            matrixA[finalI][k] * matrixB[k][j];
                }
            }
        };
        pool.add(new Thread(task));
    }
    if (parallel) {
        for (Thread trd : pool) {
            trd.start();
        }
    } else {
        for (Thread trd : pool) {
            trd.run();
        }
    }
    try {
        for (Thread trd : pool) {
            trd.join();
        }
    } catch (
            Exception e) {
        e.printStackTrace();
    }
    return returnMatrix;
}

推荐答案

根本上没有错.

与几个乘法相比,创建线程意味着巨大的开销.当前,对于512 * 512矩阵,您创建512个线程.您的CPU肯定少于512个内核,因此仅其中的8个或16个确实可以在不同的内核上并行运行,但是其他约500个内核在不增加并行执行的情况下也消耗了创建开销.

Creating a thread means a huge overhead, compared to a few multiplications. Currently, for 512*512 matrices, you create 512 threads. Your CPU surely has less than 512 cores, so only e.g. 8 or 16 of them will really run in parallel on different cores, but the ~500 others also consumed the creation overhead without increasing parallel execution.

尝试使用您自己的逻辑或通过使用框架(例如,使用CPU)将线程数限制在更接近CPU内核数的水平.java.util.concurrent包.

Try to limit the number of threads to something closer to the number of CPU cores, either with your own logic, or by using a framework, e.g. the java.util.concurrent package.

这篇关于并行矩阵乘法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆