C中的优化矩阵乘法 [英] Optimized matrix multiplication in C

查看:55
本文介绍了C中的优化矩阵乘法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试比较矩阵乘法的不同方法.第一个是普通方法:

I'm trying to compare different methods for matrix multiplication. The first one is normal method:

do
{
    for (j = 0; j < i; j++)
    {
        for (k = 0; k < i; k++)
        {
            suma = 0;
            for (l = 0; l < i; l++)
                suma += MatrixA[j][l]*MatrixB[l][k];
                MatrixR[j][k] = suma;
            }
        }
    }
    c++;
} while (c<iteraciones);

第二个包括先转置矩阵 B,然后按行进行乘法:

The second one consist of transposing the matrix B first and then do the multiplication by rows:

int f, co;
for (f = 0; f < i; f++) {
    for ( co = 0; co < i; co++) {
        MatrixB[f][co] = MatrixB[co][f];
    }
}

c = 0;
do
{
    for (j = 0; j < i; j++)
    {
        for (k = 0; k < i; k++)
        {
            suma = 0;
            for (l = 0; l < i; l++)
                suma += MatrixA[j][l]*MatrixB[k][l];
                MatrixR[j][k] = suma;
            }
        }
     }
     c++;
} while (c<iteraciones);

第二种方法应该快得多,因为我们正在访问连续的内存插槽,但我的性能没有得到显着改善.我做错了什么吗?

The second method supposed to be much faster, because we are accessing contiguous memory slots, but I'm not getting a significant improvement in the performance. Am I doing something wrong?

我可以发布完整的代码,但我认为不需要.

I can post the complete code, but I think is not needed.

推荐答案

每个程序员都应该知道的Ulrich Drepper 的 Memory (pdf 链接) 有很多关于内存效率的好主意,但特别是,他使用矩阵乘法作为一个例子,说明了解内存并使用这些知识可以加速这个过程.看他论文的附录A.1,通读6.2.1节.论文中的表 6.2 表明,对于 1000x1000 矩阵,他可以将其运行时间从简单实现的时间缩短 10%.

What Every Programmer Should Know About Memory (pdf link) by Ulrich Drepper has a lot of good ideas about memory efficiency, but in particular, he uses matrix multiplication as an example of how knowing about memory and using that knowledge can speed this process. Look at appendix A.1 in his paper, and read through section 6.2.1. Table 6.2 in the paper shows that he could get his running time to be 10% from a naive implementation's time for a 1000x1000 matrix.

诚然,他的最终代码相当冗长,并且使用了许多特定于系统的东西和编译时调优,但是,如果您真的需要速度,阅读那篇论文并阅读他的实现是绝对值得.

Granted, his final code is pretty hairy and uses a lot of system-specific stuff and compile-time tuning, but still, if you really need speed, reading that paper and reading his implementation is definitely worth it.

这篇关于C中的优化矩阵乘法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆