Eigen3矩阵乘法性能 [英] Eigen3 matrix multiplication performance

查看:1187
本文介绍了Eigen3矩阵乘法性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

注意:我也已在Eigen论坛

Note: I've posted this also on Eigen forum here

我想将3xN矩阵乘以3x3矩阵,即变换3D点,例如 p_dest = T * p_source

I want to premultiply 3xN matrices by a 3x3 matrix, i.e., to transform 3D points, like p_dest = T * p_source

初始化矩阵后:

Eigen::Matrix<double, 3, Eigen::Dynamic> points = Eigen::Matrix<double, 3, Eigen::Dynamic>::Random(3, NUMCOLS);
Eigen::Matrix<double, 3, Eigen::Dynamic> dest = Eigen::Matrix<double, 3, Eigen::Dynamic>(3, NUMCOLS);
int NT = 100;

我已经评估了这两个版本

I have evaluated this two versions

// eigen direct multiplication
for (int i = 0; i < NT; i++){
  Eigen::Matrix3d T = Eigen::Matrix3d::Random();
  dest.noalias() = T * points;
}

// col multiplication
for (int i = 0; i < NT; i++){
  Eigen::Matrix3d T = Eigen::Matrix3d::Random();
  for (int c = 0; c < points.cols(); c++){
    dest.col(c) = T * points.col(c);
  }
}

NT重复只是为了计算平均时间

the NT repetition are done just to compute average time

我感到惊讶的是,按列的列乘法比直接乘法快 4/5次 (如果我不使用.noalias(),直接乘法会更慢,但这很好,因为它正在执行临时复制) 我试图将NUMCOLS从0更改为1000000,并且该关系是线性的.

I am surprised the the column by column multiplication is about 4/5 time faster than the direct multiplication (and the direct multiplication is even slower if I do not use the .noalias(), but this is fine since it is doing a temporary copy) I've tried to change NUMCOLS from 0 to 1000000 and the relation is linear.

我正在使用Visual Studio 2013并在发行版中进行编译

I'm using Visual Studio 2013 and compiling in release

下一个图在X上显示单个列的矩阵数,在Y上显示单个操作的平均时间,蓝色表示通过col乘法的col,红色表示矩阵乘法的

The next figure shows on X the number of columns of the matrix and in Y the avg time for a single operation, in blue the col by col multiplication, in red the matrix multiplication

有人建议为什么会发生这种情况吗?

Any suggestion why this happens?

推荐答案

简短答案

您正在计时col乘法版本中的惰性评估(因此缺少评估),而不是直接版本中的惰性评估(但已评估).

Short answer

You're timing the lazy (and therefore lack of) evaluation in the col multiplication version, vs. the lazy (but evaluated) evaluation in the direct version.

让我们看看完整的 MCVE ,而不是代码段.首先,您是"版本:

Instead of code snippets, let's look at a full MCVE. First, "you're" version:

void ColMult(Matrix3Xd& dest, Matrix3Xd& points)
{
    Eigen::Matrix3d T = Eigen::Matrix3d::Random();
    for (int c = 0; c < points.cols(); c++){
        dest.col(c) = T * points.col(c);
    }
}

void EigenDirect(Matrix3Xd& dest, Matrix3Xd& points)
{
    Eigen::Matrix3d T = Eigen::Matrix3d::Random();
    dest.noalias() = T * points;
}

int main(int argc, char *argv[])
{
    srand(time(NULL));

    int NUMCOLS = 100000 + rand();

    Matrix3Xd points = Matrix3Xd::Random(3, NUMCOLS);
    Matrix3Xd dest   = Matrix3Xd(3, NUMCOLS);
    Matrix3Xd dest2  = Matrix3Xd(3, NUMCOLS);
    int NT = 200;
    // eigen direct multiplication
    auto beg1 = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < NT; i++)
    {
        EigenDirect(dest, points);
    }
    auto end1 = std::chrono::high_resolution_clock::now();

    std::chrono::duration<double> elapsed_seconds = end1-beg1;

    // col multiplication
    auto beg2 = std::chrono::high_resolution_clock::now();
    for(int i = 0; i < NT; i++)
    {
        ColMult(dest2, points);
    }

    auto end2 = std::chrono::high_resolution_clock::now();

    std::chrono::duration<double> elapsed_seconds2 = end2-beg2;
    std::cout << "Direct time: " << elapsed_seconds.count() << "\n";
    std::cout << "Col time: " << elapsed_seconds2.count() << "\n";

    std::cout << "Eigen speedup: " << elapsed_seconds2.count() / elapsed_seconds.count() << "\n\n";
    return 0;
}

使用此代码(并启用了SSE),我得到:

With this code (and SSE turned on), I get:

Direct time: 0.449301
Col time: 0.10107
Eigen speedup: 0.224949

您抱怨的4-5减速速度相同.为什么?!?!在得到答案之前,让我们稍微修改一下代码,以便将dest矩阵发送到ostream.将std::ostream outPut(0);添加到main()的开头和之前,然后将计时器添加到outPut << dest << "\n\n";outPut << dest2 << "\n\n";. std::ostream outPut(0);不输出任何内容(我很确定已设置Badbit),但确实会导致Eigens operator<<成为

Same 4-5 slowdown you complained of. Why?!?! Before we get to the answer, let's modify the code a bit so that the dest matrix is sent to an ostream. Add std::ostream outPut(0); to the beginning of main() and before ending the timers add outPut << dest << "\n\n"; and outPut << dest2 << "\n\n";. The std::ostream outPut(0); doesn't output anything (I'm pretty sure the badbit is set), but it does cause Eigens operator<< to be called, which forces the evaluation of the matrix.

注意::如果我们使用outPut << dest(1,1),则将仅对dest进行求值,以使用col乘法方法输出单个元素.

NOTE: if we used outPut << dest(1,1) then dest would be evaluated only enough to output the single element in the col multiplication method.

然后我们得到

Direct time: 0.447298
Col time: 0.681456
Eigen speedup: 1.52349

作为预期结果.请注意,Eigen直接方法花费了相同的时间(这意味着即使没有添加ostream也会进行评估),而col方法突然花费了更长的时间.

as a result as expected. Note that the Eigen direct method took the exact(ish) same time (meaning the evaluation took place even without the added ostream), whereas the col method all of the sudden took much longer.

这篇关于Eigen3矩阵乘法性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆