矢量矩阵乘法在OpenCV C ++接口中非常慢 [英] Vector-Matrix-Multiplication is very slow in the OpenCV C++ interface

查看：310 发布时间：2016/11/1 21:09:32 c++ performance opencv

本文介绍了矢量矩阵乘法在OpenCV C ++接口中非常慢的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我已经使用随机停止方法确定以下两行似乎非常慢：

I have determined with the "Random-Stop-Method" that the following two lines appear to be very slow:

cv::Mat pixelSubMue = pixel - vecMatMue[kk_real];   // ca. 35.5 %
cv::Mat pixelTemp = pixelSubMue * covInvRef;        // ca. 58.1 %
cv::multiply(pixelSubMue, pixelTemp, pixelTemp);    // ca. 0 %
cv::Scalar sumScalar = cv::sum(pixelTemp);          // ca. 3.2 %

double cost = sumScalar.val[0] * 0.5 + vecLogTerm[kk_real]; // ca. 3.2 %

vecMatMue [kk_real] code>是一个 std :: vector< cv :: Mat> < - 我知道有很多复制涉及，但使用指针不会改变 cv :: Mat（1，3，CV_64FC1） cv code>向量

 
   covInvRef 是对 cv :: Mat（3，3 ，CV_64FC1） matrix 
 
   vecLogTerm [kk_real] 是 std :: vector< double>  
 
 



vecMatMue[kk_real] is a std::vector<cv::Mat> <- I know there is a lot of copying involved, but using pointers does not change a lot in performance here
pixelSubMue is a cv::Mat(1, 3, CV_64FC1) vector
covInvRef is a reference to a cv::Mat(3, 3, CV_64FC1) matrix
vecLogTerm[kk_real] is a std::vector<double>

上面的代码段在内循环中， 。
The code snippet above is in an inner loop, that is called millions of times.
 问题：是否有办法提高该操作的速度？
Question: Is there a way to improve the speed of that operation?
 编辑：感谢您的意见！我现在测量了程序内的时间，百分比表示每行花费的时间。测量在释放模式下进行。我已经做了六个测量，每次代码执行百万次。
Edit: Thanks for the comments! I have now measured the time within the program and the percentages indicate how much of the time is spent on each line. The measurements were done in Release mode. I have done six measurements, each time the code was executed millions of times.
我也应该提到， std :: vector 对象对性能没有影响，我只是用常量对象替换它们。
I should probably also mention, that the std::vector objects have no effect on the performance, I did just replace them with constant objects.
 编辑2 ：我有也实现了使用C-Api的算法。现在相关行如下：
Edit 2: I have also implemented the algorithm using the C-Api. The relevant lines look like this now:
cvSub(pixel, vecPMatMue[kk], pixelSubMue);                   // ca. 24.4 %
cvMatMulAdd(pixelSubMue, vecPMatFCovInv[kk], 0, pixelTemp);  // ca. 39.0 %
cvMul(pixelSubMue, pixelTemp, pixelSubMue);                  // ca. 22.0 %
CvScalar sumScalar = cvSum(pixelSubMue);                     // ca. 14.6 %
cost = sumScalar.val[0] * 0.5 + vecFLogTerm[kk];             // ca. 0.0 %

 C ++实现需要相同的输入数据。 3100毫秒，而C实现只需要约。 2050毫秒（两个测量都指代执行代码段的总时间百万次）。但是我仍然喜欢我的C ++实现，因为它更容易为我阅读（其他丑陋更改必须使它与C-API一起使用）。
The C++ implementation needs for the same input data ca. 3100 msec while the C-Implementation needs only ca. 2050 msec (both measurements refer to the total time for executing the snippet millions of times). But I still prefer my C++ implementation, since it is easier to read for me (other "ugly" changes had to be made to make it work with the C-API).
 修改3 ：我已重写代码，但未使用任何函数调用进行实际计算：
Edit 3: I have rewritten the code without using any function calls for the actual calculations:
capacity_t mue0 = meanRef.at<double>(0, 0);
capacity_t mue1 = meanRef.at<double>(0, 1);
capacity_t mue2 = meanRef.at<double>(0, 2);

capacity_t sigma00 = covInvRef.at<double>(0, 0);
capacity_t sigma01 = covInvRef.at<double>(0, 1);
capacity_t sigma02 = covInvRef.at<double>(0, 2);
capacity_t sigma11 = covInvRef.at<double>(1, 1);
capacity_t sigma12 = covInvRef.at<double>(1, 2);
capacity_t sigma22 = covInvRef.at<double>(2, 2);

mue0 = p0 - mue0; mue1 = p1 - mue1; mue2 = p2 - mue2;

capacity_t pt0 = mue0 * sigma00 + mue1 * sigma01 + mue2 * sigma02;
capacity_t pt1 = mue0 * sigma01 + mue1 * sigma11 + mue2 * sigma12;
capacity_t pt2 = mue0 * sigma02 + mue1 * sigma12 + mue2 * sigma22;

mue0 *= pt0; mue1 *= pt1; mue2 *= pt2;

capacity_t cost = (mue0 + mue1 + mue2) / 2.0 + vecLogTerm[kk_real];

现在每个像素的计算只需要150ms！
Now the calculations for every pixel only need 150ms!
推荐答案
看起来你正在编译调试模式，这可能解释了性能影响。您可以使用时间函数（例如  clock（） 。
It looks like you're compiling Debug mode which probably explains the performance hit. You can profile your code using time functions such as clock().
例如
clock_t start,end;
...
start = clock();
cv::Mat pixelTemp = pixelSubMue * covInvRef;    // Very SLOW!
end = clock();

cout<<"Elapsed time in seconds: "<<(static_cast<double>(end)-start)/CLK_TCK<<endl;


                        这篇关于矢量矩阵乘法在OpenCV C ++接口中非常慢的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

矢量矩阵乘法在OpenCV C ++接口中非常慢 [英] Vector-Matrix-Multiplication is very slow in the OpenCV C++ interface

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

矢量矩阵乘法在OpenCV C ++接口中非常慢 [英] Vector-Matrix-Multiplication is very slow in the OpenCV C++ interface

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭