加速OpticalFlow算法 - OpenCV [英] Accelerating OpticalFlow Algorithm - OpenCV

查看:203
本文介绍了加速OpticalFlow算法 - OpenCV的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究一个使用光流算法估算无人机位置的项目。我目前正在使用

  Q1 = cv :: UMat(灰色,范围(0,HEIGHT_RES * 0.55),范围(0,WIDTH_RES * 0.55)); 
Q2 = cv :: UMat(灰色,范围(0,HEIGHT_RES * 0.55),范围(WIDTH_RES * 0.45,WIDTH_RES));
Q3 = cv :: UMat(灰色,范围(0.45 * HEIGHT_RES,HEIGHT_RES),范围(0,WIDTH_RES * 0.55));
Q4 = cv :: UMat(灰色,范围(0.45 * HEIGHT_RES,HEIGHT_RES),范围(WIDTH_RES * 0.45,WIDTH_RES));

每个线程在四分之一处进行光流处理(下面的第2部分),主循环是等待所有线程完成以收集结果和平均值。


  • 使用稀疏方法 - 在选定的投资回报率中使用 calcOpticalFlowPyrLK 方法网格而不是使用 calcOpticalFlowFarneback 。使用Lucas-Kanade稀疏方法而不是Farneback密集方法消耗的CPU时间要少得多。在我的例子中,我使用 gridstep = 10 创建了一个网格。这是创建网格的简单函数:

      void createGrid(vector< cv :: Point2f>& grid,int16_t wRes ,int16_t hRes,int step){
    for(int i = 0; i< wRes; i + = step)
    for(int j = 0; j< hRes; j + = step)
    grid.push_back(cv :: Point2f(i,j));
    }

    请注意,如果网格在整个运行过程中保持不变,那么最好是只在进入主循环之前创建一次。


  • 执行这两个部分后,运行程序时,所有4个核心Odroid U3的持续工作率为60%-80%,性能也有所提升。


    I am working on a project for estimating a UAV location using optical-flow algorithm. I am currently using cv::calcOpticalFlowFarneback for this purpose.
    My hardware is an Odroid U3 that will finally be connected to the UAV flight controller.

    The problem is that this method is really heavy for this hardware and I am looking for some other ways to optimize / accelerate it.

    Things that I've already tried:

    • Reducing resolution to 320x240 or even 160x120.
    • Using OpenCV TBB (compiled using WITH_TBB=ON BUILD_TBB=ON and adding -ltbb).
    • Changing optical-flow parameters as suggested here

    Adding the relevant part of my code:

    int opticalFlow(){
    
        // capture from camera
        VideoCapture cap(0);
        if( !cap.isOpened() )
            return -1;
    
        // Set Resolution - The Default Resolution Is 640 x 480
        cap.set(CV_CAP_PROP_FRAME_WIDTH,WIDTH_RES);
        cap.set(CV_CAP_PROP_FRAME_HEIGHT,HEIGHT_RES);
    
        Mat flow, cflow, undistortFrame, processedFrame, origFrame, croppedFrame;
        UMat gray, prevgray, uflow;
    
        currLocation.x = 0;
        currLocation.y = 0;
    
        // for each frame calculate optical flow
        for(;;)
        {
            // take out frame- still distorted
            cap >> origFrame;
    
            // Convert to gray
            cvtColor(origFrame, processedFrame, COLOR_BGR2GRAY);
    
            // rotate image - perspective transformation
            rotateImage(processedFrame, gray, eulerFromSensors.roll, eulerFromSensors.pitch, 0, 0, 0, 1, cameraMatrix.at<double>(0,0),
            cameraMatrix.at<double>(0,2),cameraMatrix.at<double>(1,2));
    
            if( !prevgray.empty() )
            {
                // calculate flow
                calcOpticalFlowFarneback(prevgray, gray, uflow, 0.5, 3, 10, 3, 3, 1.2, 0);
                uflow.copyTo(flow);
    
                // get average
                calcAvgOpticalFlow(flow, 16, corners);
    
                /*
                Some other calculations
                .
                .
                .
                Updating currLocation struct
                */
            }
            //break conditions
            if(waitKey(1)>=0)
                break;
            if(end_run)
                break;
            std::swap(prevgray, gray);
        }
        return 0;
    }
    

    Notes:

    • I've ran callgrind and the bottleneck is as expected the calcOpticalFlowFarneback function.
    • I checked the CPU cores load while running the program, and it is not using all 4 cores heavily, only one core is on 100% at a given time (even with TBB):

    解决方案

    First, I want to say thanks for this answer below that I used in order to build my final solution that I will explain with as many details as I can.

    My solution is divided into two parts:

    1. Multithreading - Splitting each frame into 4 matrices, each quarter in a different matrix. Creating 4 threads and running each quarter processing in a different thread. I created the 4 quarters matrices such that there will be some (5%) overlap between them so that I won't lose the connecting between them (see figure below - yellow part is 55% from width and 55% from height).

      Q1 = cv::UMat(gray, Range(0, HEIGHT_RES*0.55), Range(0, WIDTH_RES*0.55));
      Q2 = cv::UMat(gray, Range(0, HEIGHT_RES*0.55), Range(WIDTH_RES*0.45, WIDTH_RES));
      Q3 = cv::UMat(gray, Range(0.45*HEIGHT_RES, HEIGHT_RES), Range(0, WIDTH_RES*0.55));
      Q4 = cv::UMat(gray, Range(0.45*HEIGHT_RES, HEIGHT_RES), Range(WIDTH_RES*0.45, WIDTH_RES));
      

      Each thread is doing the optical flow processing (part 2 below) on a quarter and the main loop is waiting for all threads to finish in order to collect the results and averaging.

    2. Using a sparse method - Using calcOpticalFlowPyrLK method within a selected ROI grid instead of using calcOpticalFlowFarneback. Using Lucas-Kanade sparse method instead of the Farneback dense method is consuming much less CPU time. In my case I created a grid with gridstep=10. This is the simple function for creating the grid:

      void createGrid(vector<cv::Point2f> &grid, int16_t wRes, int16_t hRes, int step){
      for (int i= 0; i < wRes ; i+=step)
          for (int j= 0; j < hRes; j+=step)
              grid.push_back(cv::Point2f(i,j));
      }
      

      Note that if the grid is constant during the whole run, it is better to only create it once before entering the main loop.

    After implementing both parts, when running the program, all 4 cores of the Odroid U3 were constantly working on 60%-80% and the performance were accelerated.

    这篇关于加速OpticalFlow算法 - OpenCV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆