如何加快Eigen库的矩阵产品的速度? [英] How to speed up Eigen library's matrix product?

查看:1105
本文介绍了如何加快Eigen库的矩阵产品的速度?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Eigen库研究两个大矩阵的简单乘法.对于相同大小的矩阵,这种乘法似乎比Matlab和Python都慢.

I'm studying simple multiplication of two big matrices using the Eigen library. This multiplication appears to be noticeably slower than both Matlab and Python for the same size matrices.

是否有什么方法可以使本征操作更快?

Is there anything to be done to make the Eigen operation faster?

问题详细信息

X:随机的1000 x 50000矩阵

X : random 1000 x 50000 matrix

Y:随机50000 x 300矩阵

Y : random 50000 x 300 matrix

计时实验(在我2011年末的Macbook Pro上进行)

使用Matlab:X * Y大约需要1.3秒

Using Matlab: X*Y takes ~1.3 sec

使用有思想的Python:numpy.dot(X,Y)花费约2.2秒

Using Enthought Python: numpy.dot( X, Y) takes ~ 2.2 sec

使用特征值:X * Y耗时约2.7秒

Using Eigen: X*Y takes ~2.7 sec

特征详细信息

您可以获取我的本征代码(作为MEX函数): https://gist.github.com /michaelchughes/4742878

You can get my Eigen code (as a MEX function): https://gist.github.com/michaelchughes/4742878

此MEX函数从Matlab读取两个矩阵,然后返回它们的乘积.

This MEX function reads in two matrices from Matlab, and returns their product.

在没有矩阵乘积运算的情况下运行此MEX函数(即仅执行IO)所产生的开销可以忽略不计,因此,该函数与Matlab之间的IO不能解释性能上的巨大差异.显然,这是实际的矩阵乘积运算.

Running this MEX function without the matrix product operation (ie just doing the IO) produces negligible overhead, so the IO between the function and Matlab doesn't explain the big difference in performance. It's clearly the actual matrix product operation.

我正在使用g ++进行编译,并带有以下优化标志:"-O3 -DNDEBUG"

I'm compiling with g++, with these optimization flags: "-O3 -DNDEBUG"

我正在使用最新的稳定的Eigen头文件(3.1.2).

I'm using the latest stable Eigen header files (3.1.2).

关于如何改善本征性能的任何建议?有人可以复制我看到的空白吗?

Any suggestions on how to improve Eigen's performance? Can anybody replicate the gap I'm seeing?

更新 编译器似乎真的很重要.原始Eigen计时是使用Apple XCode的g ++版本:llvm-g ++-4.2完成的.

UPDATE The compiler really seems to matter. The original Eigen timing was done using Apple XCode's version of g++: llvm-g++-4.2.

当我使用通过MacPorts下载的g ++-4.7(相同的CXXOPTIMFLAGS)时,我得到的是2.4秒而不是2.7秒.

When I use g++-4.7 downloaded via MacPorts (same CXXOPTIMFLAGS), I get 2.4 sec instead of 2.7.

对于如何更好地进行编译的任何其他建议,将不胜感激.

Any other suggestions of how to compile better would be much appreciated.

您还可以获取此实验的原始C ++代码: https://gist.github.com/michaelchughes /4747789

You can also get raw C++ code for this experiment: https://gist.github.com/michaelchughes/4747789

./MatProdEigen 1000 50000 300

./MatProdEigen 1000 50000 300

在g ++-4.7下报告2.4秒

reports 2.4 seconds under g++-4.7

推荐答案

首先,在进行性能比较时,请确保禁用涡轮增压(TB).在我的系统上,使用Macport上的gcc 4.5,并且没有涡轮增压,我得到3.5s,相当于8.4 GFLOPS,而我的2.3核心i7的理论峰值是9.2GFLOPS,所以还不错.

First of all, when doing performance comparison, makes sure you disabled turbo-boost (TB). On my system, using gcc 4.5 from macport and without turbo-boost, I get 3.5s, that corresponds to 8.4 GFLOPS while the theoretical peak of my 2.3 core i7 is 9.2GFLOPS, so not too bad.

MatLab基于Intel MKL,并且看到所报告的性能,它显然使用了多线程版本.像Eigen这样的小型图书馆不太可能在自己的CPU上击败英特尔!

MatLab is based on Intel MKL, and seeing the reported performance, it clearly uses a multithreaded version. It is unlikely that an small library as Eigen can beat Intel on its own CPU!

Numpy可以使用任何BLAS库,Atlas,MKL,OpenBLAS,eigen-blas等.我猜在您的情况下,它也在使用Atlas,而且速度也很快.

Numpy can uses any BLAS library, Atlas, MKL, OpenBLAS, eigen-blas, etc. I guess that in your case it was using Atlas which is fast too.

最后,这是获得更好性能的方法:通过使用-fopenmp进行编译,可以在Eigen中启用多线程.默认情况下,Eigen使用默认的OpenMP线程数作为线程号.不幸的是,该数字对应于逻辑内核的数量,而不是物理内核的数量,因此请确保禁用超线程,或者将OMP_NUM_THREADS环境变量定义为物理内核的数量.在这里,我得到1.25s(不包含TB),而得到0.95s(包含TB).

Finally, here is how you can get better performance: enable multi-threading in Eigen by compiling with -fopenmp. By default Eigen uses for the number of the thread the default number of thread defined by OpenMP. Unfortunately this number corresponds to the number of logic cores, and not physical cores, so make sure hyper-threading is disabled or define the OMP_NUM_THREADS environment variable to the physical number of cores. Here I get 1.25s (without TB), and 0.95s with TB.

这篇关于如何加快Eigen库的矩阵产品的速度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆