用C优化的矩阵乘法 [英] Optimized matrix multiplication in C

查看:152
本文介绍了用C优化的矩阵乘法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想比较的矩阵乘法不同的方法。
第一个是正常的方法:

I'm trying to compare different methods for matrix multiplication. The first one is normal method:

do
{
    for (j = 0; j < i; j++)
    {
        for (k = 0; k < i; k++)
        {
            suma = 0;
            for (l = 0; l < i; l++)
                suma += MatrixA[j][l]*MatrixB[l][k];
                MatrixR[j][k] = suma;
            }
        }
    }
    c++;
} while (c<iteraciones);

第二个由第一调换矩阵B,然后通过行做乘法:

The second one consist of transposing the matrix B first and then do the multiplication by rows:

int f, co;
for (f = 0; f < i; f++) {
    for ( co = 0; co < i; co++) {
        MatrixB[f][co] = MatrixB[co][f];
    }
}

c = 0;
do
{
    for (j = 0; j < i; j++)
    {
        for (k = 0; k < i; k++)
        {
            suma = 0;
            for (l = 0; l < i; l++)
                suma += MatrixA[j][l]*MatrixB[k][l];
                MatrixR[j][k] = suma;
            }
        }
     }
     c++;
} while (c<iteraciones);

应该是快很多,因为我们是在访问连续的内存插槽,但我没有得到的性能改善显著第二种方法。难道我做错了什么?

The second method supposed to be much faster, because we are accessing contiguous memory slots, but I'm not getting a significant improvement in the performance. Am I doing something wrong?

我可以张贴整个code,但我认为是没有必要的。

I can post the complete code, but I think is not needed.

推荐答案

什么每个程序员应该知道内存 (PDF链接)由乌尔里希Drepper有很多关于内存效率不错的想法,但特别是,他使用矩阵乘法如何知道关于记忆和使用知识,可以加速这一过程的一个例子。请看附录A.1在他的论文,并通过6.2.1节阅读。表6.2中的文件显示,他可以得到他的运行时间从一个天真的执行的时间有10%的1000×1000矩阵。

What Every Programmer Should Know About Memory (pdf link) by Ulrich Drepper has a lot of good ideas about memory efficiency, but in particular, he uses matrix multiplication as an example of how knowing about memory and using that knowledge can speed this process. Look at appendix A.1 in his paper, and read through section 6.2.1. Table 6.2 in the paper shows that he could get his running time to be 10% from a naive implementation's time for a 1000x1000 matrix.

当然,他最后的code是pretty毛茸茸,并且使用了大量的系统特定的东西,编译时调整了,不过,如果你的真正的需要速度,读取纸和阅读他的绝对执行是值得的。

Granted, his final code is pretty hairy and uses a lot of system-specific stuff and compile-time tuning, but still, if you really need speed, reading that paper and reading his implementation is definitely worth it.

这篇关于用C优化的矩阵乘法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆