OpenCV的:C ++和C性能比较 [英] OpenCV: C++ and C performance comparison

查看:131
本文介绍了OpenCV的:C ++和C性能比较的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

现在,我使用开发API OpenCV的某些应用( C ++ )。这个应用程序与视频处理。

Right now I'm developing some application using OpenCV API (C++). This application does processing with video.

在PC上的一切工作真快。而今天,我决定端口上的Andr​​oid(利用摄像头作为videoinput)这个应用程序。幸运的是,有没有Android的OpenCV的,所以我只是将我的母语code样品Android应用程序。除了性能比较一切工作正常。我基准我的应用程序,发现应用程序使用4-5 fps的,究竟是不能接受的(我的设备具有单一核心1GHz处理器) - 我希望它与约10 fps的工作

On the PC everything works really fast. And today I decided to port this application on Android (to use camera as videoinput). Fortunately, there's OpenCV for Android so I just added my native code to sample Android application. Everything works fine except perfomance. I benchmarked my application and found that application works with 4-5 fps, what is actually not acceptable (my device has singlecore 1ghz processor) - I want it to work with about 10 fps.

这会让一个SENCE完全重写我的 C 应用程序?我知道,使用诸如的std ::矢量是开发商多舒服,但我不关心它。

Does it make a sence to fully rewrite my application on C? I know that using such things as std::vector is much comfortable for developer, but I don't care about it.

似乎 OpenCV的公司的C 接口有相同的函数/方法为 C ++ 接口。

It seems that OpenCV's C interface has same functions/methods as C++ interface.

我GOOGLE了这个问题,但没有发现任何东西。

I googled this question but didn't find anything.

感谢您的任何建议。

推荐答案

我已经工作了很多Android和优化(我写的处理中4ms的一帧的视频处理应用程序),所以我希望我会给你一些相关答案。

I've worked quite a lot with Android and optimizations (I wrote a video processing app that processes a frame in 4ms) so I hope I will give you some pertinent answers.

有没有在OpenCV中的C和C ++接口太大的区别。一些code是用C语言编写,并且有一个C ++包装,有的反之亦然。这两个(由Shervin EMAMI测量)之间的任何差别显著要么回归,修正或质量的改善。你应该坚持使用最新版本的OpenCV

There is not much difference between the C and C++ interface in OpenCV. Some of the code is written in C, and has a C++ wrapper, and some viceversa. Any significant differences between the two (as measured by Shervin Emami) are either regressions, bug fixes or quality improvements. You should stick with the latest OpenCV version.

为什么不重写?

您将度过一个很好的协议的时候,你可以使用好得多。 C接口很麻烦,并引入错误或内存泄漏的几率是很高的。你应该避免它,在我看来。

You will spend a good deal of time, which you could use much better. The C interface is cumbersome, and the chance to introduce bugs or memory leaks is high. You should avoid it, in my opinion.

优化建议

一个。打开优化。

这两种编译器优化和缺乏调试断言可以使你的运行时间有很大的区别。

Both compiler optimizations and the lack of debug assertions can make a big difference in your running time.

乙。个人资料您的应用程序。

B. Profile your app.

做首先在您的计算机上,因为它是容易得多。使用Visual Studio探查,识别速度慢的部分。优化。决不优化,因为你觉得很慢,而是因为你衡量它。先从最慢的功能,优化它尽可能的,然后采取第二慢。测量你的变化,以确保它是真的快。

Do it first on your computer, since it is much easier. Use visual studio profiler, to identify the slow parts. Optimize them. Never optimize because you think is slow, but because you measure it. Start with the slowest function, optimize it as much as possible, then take the second slower. Measure your changes to make sure it's indeed faster.

℃。聚焦算法。

一个更快的算法能改善幅度(100X)的订单表现。 A C ++招会给你也许2倍的性能提升。

A faster algorithm can improve performance with orders of magnitude (100x). A C++ trick will give you maybe 2x performance boost.

经典技巧:


  • 调整你的视频帧要小一些。通常情况下,你可以提取,而不是1024x768的从200x300px图像的信息。第一个的面积小10倍。

  • Resize you video frames to be smaller. Often you can extract the information from a 200x300px image, instead of a 1024x768. The area of the first one is 10 times smaller.

使用简单的操作,而不是复杂的问题。使用整数,而不是浮动。而从不使用双击在矩阵或执行数千次一循环。

Use simpler operations instead of complicated ones. Use integers instead of floats. And never use double in a matrix or a for loop that executes thousands of times.

做尽可能少的计算成为可能。可以仅在图像的特定区域追踪一个对象,而不是处理其所有所有的帧?你可以做一个非常小的图像上粗略/大致检测,然后在全画幅之完善,在投资回报率?

Do as little calculation as possible. Can you track an object only in a specific area of the image, instead of processing it all for all the frames? Can you make a rough/approximate detection on a very small image and then refine it on a ROI in the full frame?

ð使用C,其中它的事项

D. Use C where it matters

在循环中,它可能是有意义的用C代替风格的C ++。一个指向数据矩阵或float数组比mat.at或std ::矢量&lt快得多;>。通常情况下,瓶颈是一个嵌套循环。重点关注一下。它没有任何意义,以取代矢量<>所有的地方和你的spaghettify code

In loops, it may make sense to use C style instead of C++. A pointer to a data matrix or a float array is much faster than mat.at or std::vector<>. Often the bottleneck is a nested loop. Focus on it. It doesn't make sense to replace vector<> all over the place and spaghettify your code.

避免隐性成本

一些OpenCV函数转换数据翻一番,处理它,然后再转换回输入格式。当心他们,他们杀了移动设备上的性能。例如:变形,缩放,类型转换。另外,色彩空间转换被称为是懒惰。 preFER灰度直接从本地获得YUV

Some OpenCV functions convert data to double, process it, then convert back to the input format. Beware of them, they kill performance on mobile devices. Examples: warping, scaling, type conversions. Also, color space conversions are known to be lazy. Prefer grayscale obtained directly from native YUV.

F。使用矢量

ARM处理器实现了一个名为NEON技术矢量化。了解如何使用它。它功能强大!

ARM processors implement vectorization with a technology called NEON. Learn to use it. It is powerful!

一个小例子:

float* a, *b, *c;
// init a and b to 1000001 elements
for(int i=0;i<1000001;i++)
    c[i] = a[i]*b[i];

可以如下重写。它更详细,但速度要快得多。

can be rewritten as follows. It's more verbose, but much faster.

float* a, *b, *c;
// init a and b to 1000001 elements
float32x4_t _a, _b, _c;
int i;
for(i=0;i<1000001;i+=4)
{  
    a_ = vld1q_f32( &a[i] ); // load 4 floats from a in a NEON register
    b_ = vld1q_f32( &b[i] );
    c_ = vmulq_f32(a_, b_); // perform 4 float multiplies in parrallel
    vst1q_f32( &c[i], c_); // store the four results in c
}
// the vector size is not always multiple of 4 or 8 or 16. 
// Process the remaining elements
for(;i<1000001;i++)
    c[i] = a[i]*b[i];

必须用汇编写的,但对于一个普通的程序员这是一个有点令人生畏。我不得不使用海合会内部函数,就像上面的例子很好的效果

Purists say you must write in assembler, but for a regular programmer that's a bit daunting. I had good results using gcc intrinsics, like in the above example.

要快速启动另一种方式是手工convrt codeD SSE优化code在OpenCV中到NEON。 SSE是NEON相当于英特尔处理器,许多OpenCV的函数中使用它,就像<一个href=\"http://$c$c.opencv.org/projects/opencv/repository/entry/trunk/opencv/modules/imgproc/src/filter.cpp#L1949\"相对=nofollow>此处。这是图像过滤code为UCHAR矩阵(普通图像格式)。您should't盲目转换指令一个接一个,但把它作为一个例子开始。

Another way to jump-start is to convrt handcoded SSE-optimized code in OpenCV into NEON. SSE is the NEON equivalent in Intel processors, and many OpenCV functions use it, like here. This is the image filtering code for uchar matrices (the regular image format). You should't blindly convert instructions one by one, but take it as an example to start with.

您可以在这个博客以及以下的职位。

You can read more about NEON in this blog and the following posts.

-G。注意图像捕捉

它可以是在移动设备上出奇缓慢。优化它的设备和操作系统特有的。

It can be surprisingly slow on a mobile device. Optimizing it is device and OS specific.

这篇关于OpenCV的:C ++和C性能比较的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆