矩阵乘法:Strassen与标准 [英] Matrix multiplication: Strassen vs. Standard

查看:248
本文介绍了矩阵乘法:Strassen与标准的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试使用C ++实现

matrix.cpp http://pastebin.com/wYADLJ8Y

main.cpp http://pastebin.com/48BSqGJr



g ++ main.cpp matrix.cpp -o matrix -O3

解决方案

一些想法:




  • 您是否已优化它以考虑非零大小的矩阵填充为零?我认为算法假设你不打扰乘这些术语。这就是为什么你得到运行时间在2 ^ n和2 ^(n + 1)-1之间恒定的平坦区域。通过不乘以术语你知道是零你应该能够改善这些领域。

  • 考虑一个大矩阵是任意的,并且算法只比稍微好一点,O(N ^ 3)对O(N ^ 2.8)。在尝试更大的矩阵之前,您可能看不到可测量的增益。例如,我做了一些有限元模型,其中10,000x10,000矩阵被认为是小。

  • 尝试使用各种优化级别进行测试,包括根本没有优化。

  • 该算法似乎假设乘法比加法要贵得多。这当然是真的40年前,当它是第一次发展,但我相信在更现代的处理器之间的差异加法和乘法已经变得更小。这可能会降低似乎减少乘法但增加加法的算法的有效性。

  • 你看过一些其他Strassen实现有想法吗?尝试对已知的良好实施进行基准化,以确切了解您可以获得多快。


I tried to implement the Strassen algorithm for matrix multiplication with C++, but the result isn't that, what I expected. As you can see strassen always takes more time then standard implementation and only with a dimension from a power of 2 is as fast as standard implementation. What went wrong?

matrix mult_strassen(matrix a, matrix b) {
if (a.dim() <= cut)
    return mult_std(a, b);

matrix a11 = get_part(0, 0, a);
matrix a12 = get_part(0, 1, a);
matrix a21 = get_part(1, 0, a);
matrix a22 = get_part(1, 1, a);

matrix b11 = get_part(0, 0, b);
matrix b12 = get_part(0, 1, b);
matrix b21 = get_part(1, 0, b);
matrix b22 = get_part(1, 1, b);

matrix m1 = mult_strassen(a11 + a22, b11 + b22); 
matrix m2 = mult_strassen(a21 + a22, b11);
matrix m3 = mult_strassen(a11, b12 - b22);
matrix m4 = mult_strassen(a22, b21 - b11);
matrix m5 = mult_strassen(a11 + a12, b22);
matrix m6 = mult_strassen(a21 - a11, b11 + b12);
matrix m7 = mult_strassen(a12 - a22, b21 + b22);

matrix c(a.dim(), false, true);
set_part(0, 0, &c, m1 + m4 - m5 + m7);
set_part(0, 1, &c, m3 + m5);
set_part(1, 0, &c, m2 + m4);
set_part(1, 1, &c, m1 - m2 + m3 + m6);

return c; 
}


PROGRAM
matrix.h http://pastebin.com/TYFYCTY7
matrix.cpp http://pastebin.com/wYADLJ8Y
main.cpp http://pastebin.com/48BSqGJr

g++ main.cpp matrix.cpp -o matrix -O3.

解决方案

Some thoughts:

  • Have you optimized it to consider that a non-power of two sized matrix is filled in with zeroes? I think the algorithm assumes you don't bother multiplying these terms. This is why you get the flat areas where the running time is constant between 2^n and 2^(n+1)-1. By not multiplying terms you know are zero you should be able to improve these areas. Or perhaps Strassen is only meant to work with 2^n sized matrices.
  • Consider that a "large" matrix is arbitrary and the algorithm is only slightly better than the naive case, O(N^3) vs O(N^2.8). You may not see measurable gains until bigger matrices are tried. For example, I was did some Finite Element modeling where 10,000x10,000 matrices were considered "small". Its hard to tell from your graph but it looks like the 511 case may be faster in the Stassen case.
  • Try testing with various optimization levels including no optimizations at all.
  • This algorithm seems to assume that multiplications are much more expensive than additions. This was certainly true 40 years ago when it was first developed but I believe in more modern processors the difference between add and multiply has gotten smaller. This may reduce the effectiveness of the algorithm which seems to reduce multiplications but increase additions.
  • Have you looked at some of the other Strassen implementations out there for ideas? Try benchmarking a known good implementation to see exactly how much faster you can get.

这篇关于矩阵乘法:Strassen与标准的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆