矢量矩阵乘法在OpenCV C ++接口中非常慢 [英] Vector-Matrix-Multiplication is very slow in the OpenCV C++ interface

查看:310
本文介绍了矢量矩阵乘法在OpenCV C ++接口中非常慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经使用随机停止方法确定以下两行似乎非常慢:

I have determined with the "Random-Stop-Method" that the following two lines appear to be very slow:

cv::Mat pixelSubMue = pixel - vecMatMue[kk_real];   // ca. 35.5 %
cv::Mat pixelTemp = pixelSubMue * covInvRef;        // ca. 58.1 %
cv::multiply(pixelSubMue, pixelTemp, pixelTemp);    // ca. 0 %
cv::Scalar sumScalar = cv::sum(pixelTemp);          // ca. 3.2 %

double cost = sumScalar.val[0] * 0.5 + vecLogTerm[kk_real]; // ca. 3.2 %




  • vecMatMue [kk_real] code>是一个 std :: vector< cv :: Mat> < - 我知道有很多复制涉及,但使用指针不会改变 cv :: Mat(1,3,CV_64FC1) cv code>向量

  • covInvRef 是对 cv :: Mat(3,3 ,CV_64FC1) matrix

  • vecLogTerm [kk_real] std :: vector< double>

    • vecMatMue[kk_real] is a std::vector<cv::Mat> <- I know there is a lot of copying involved, but using pointers does not change a lot in performance here
    • pixelSubMue is a cv::Mat(1, 3, CV_64FC1) vector
    • covInvRef is a reference to a cv::Mat(3, 3, CV_64FC1) matrix
    • vecLogTerm[kk_real] is a std::vector<double>
    • 上面的代码段在内循环中, 。

      The code snippet above is in an inner loop, that is called millions of times.

      问题:是否有办法提高该操作的速度?

      Question: Is there a way to improve the speed of that operation?

      编辑:感谢您的意见!我现在测量了程序内的时间,百分比表示每行花费的时间。测量在释放模式下进行。我已经做了六个测量,每次代码执行百万次。

      Edit: Thanks for the comments! I have now measured the time within the program and the percentages indicate how much of the time is spent on each line. The measurements were done in Release mode. I have done six measurements, each time the code was executed millions of times.

      我也应该提到, std :: vector 对象对性能没有影响,我只是用常量对象替换它们。

      I should probably also mention, that the std::vector objects have no effect on the performance, I did just replace them with constant objects.

      编辑2 :我有也实现了使用C-Api的算法。现在相关行如下:

      Edit 2: I have also implemented the algorithm using the C-Api. The relevant lines look like this now:

      cvSub(pixel, vecPMatMue[kk], pixelSubMue);                   // ca. 24.4 %
      cvMatMulAdd(pixelSubMue, vecPMatFCovInv[kk], 0, pixelTemp);  // ca. 39.0 %
      cvMul(pixelSubMue, pixelTemp, pixelSubMue);                  // ca. 22.0 %
      CvScalar sumScalar = cvSum(pixelSubMue);                     // ca. 14.6 %
      cost = sumScalar.val[0] * 0.5 + vecFLogTerm[kk];             // ca. 0.0 %
      

      C ++实现需要相同的输入数据。 3100毫秒,而C实现只需要约。 2050毫秒(两个测量都指代执行代码段的总时间百万次)。但是我仍然喜欢我的C ++实现,因为它更容易为我阅读(其他丑陋更改必须使它与C-API一起使用)。

      The C++ implementation needs for the same input data ca. 3100 msec while the C-Implementation needs only ca. 2050 msec (both measurements refer to the total time for executing the snippet millions of times). But I still prefer my C++ implementation, since it is easier to read for me (other "ugly" changes had to be made to make it work with the C-API).

      修改3 :我已重写代码,但未使用任何函数调用进行实际计算:

      Edit 3: I have rewritten the code without using any function calls for the actual calculations:

      capacity_t mue0 = meanRef.at<double>(0, 0);
      capacity_t mue1 = meanRef.at<double>(0, 1);
      capacity_t mue2 = meanRef.at<double>(0, 2);
      
      capacity_t sigma00 = covInvRef.at<double>(0, 0);
      capacity_t sigma01 = covInvRef.at<double>(0, 1);
      capacity_t sigma02 = covInvRef.at<double>(0, 2);
      capacity_t sigma11 = covInvRef.at<double>(1, 1);
      capacity_t sigma12 = covInvRef.at<double>(1, 2);
      capacity_t sigma22 = covInvRef.at<double>(2, 2);
      
      mue0 = p0 - mue0; mue1 = p1 - mue1; mue2 = p2 - mue2;
      
      capacity_t pt0 = mue0 * sigma00 + mue1 * sigma01 + mue2 * sigma02;
      capacity_t pt1 = mue0 * sigma01 + mue1 * sigma11 + mue2 * sigma12;
      capacity_t pt2 = mue0 * sigma02 + mue1 * sigma12 + mue2 * sigma22;
      
      mue0 *= pt0; mue1 *= pt1; mue2 *= pt2;
      
      capacity_t cost = (mue0 + mue1 + mue2) / 2.0 + vecLogTerm[kk_real];
      

      现在每个像素的计算只需要150ms!

      Now the calculations for every pixel only need 150ms!

      推荐答案

      看起来你正在编译调试模式,这可能解释了性能影响。您可以使用时间函数(例如 clock()

      It looks like you're compiling Debug mode which probably explains the performance hit. You can profile your code using time functions such as clock().

      例如

      clock_t start,end;
      ...
      start = clock();
      cv::Mat pixelTemp = pixelSubMue * covInvRef;    // Very SLOW!
      end = clock();
      
      cout<<"Elapsed time in seconds: "<<(static_cast<double>(end)-start)/CLK_TCK<<endl;
      

      这篇关于矢量矩阵乘法在OpenCV C ++接口中非常慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆