加速OpticalFlow算法 - OpenCV [英] Accelerating OpticalFlow Algorithm - OpenCV
问题描述
我正在研究一个使用光流算法估算无人机位置的项目。我目前正在使用
Q1 = cv :: UMat(灰色,范围(0,HEIGHT_RES * 0.55),范围(0,WIDTH_RES * 0.55));
Q2 = cv :: UMat(灰色,范围(0,HEIGHT_RES * 0.55),范围(WIDTH_RES * 0.45,WIDTH_RES));
Q3 = cv :: UMat(灰色,范围(0.45 * HEIGHT_RES,HEIGHT_RES),范围(0,WIDTH_RES * 0.55));
Q4 = cv :: UMat(灰色,范围(0.45 * HEIGHT_RES,HEIGHT_RES),范围(WIDTH_RES * 0.45,WIDTH_RES));
每个线程在四分之一处进行光流处理(下面的第2部分),主循环是等待所有线程完成以收集结果和平均值。
使用稀疏方法 - 在选定的投资回报率中使用 calcOpticalFlowPyrLK
方法网格而不是使用 calcOpticalFlowFarneback
。使用Lucas-Kanade稀疏方法而不是Farneback密集方法消耗的CPU时间要少得多。在我的例子中,我使用 gridstep = 10
创建了一个网格。这是创建网格的简单函数:
void createGrid(vector< cv :: Point2f>& grid,int16_t wRes ,int16_t hRes,int step){
for(int i = 0; i< wRes; i + = step)
for(int j = 0; j< hRes; j + = step)
grid.push_back(cv :: Point2f(i,j));
}
请注意,如果网格在整个运行过程中保持不变,那么最好是只在进入主循环之前创建一次。
执行这两个部分后,运行程序时,所有4个核心Odroid U3的持续工作率为60%-80%,性能也有所提升。
I am working on a project for estimating a UAV location using optical-flow algorithm. I am currently using cv::calcOpticalFlowFarneback
for this purpose.
My hardware is an Odroid U3 that will finally be connected to the UAV flight controller.
The problem is that this method is really heavy for this hardware and I am looking for some other ways to optimize / accelerate it.
Things that I've already tried:
- Reducing resolution to 320x240 or even 160x120.
- Using OpenCV TBB (compiled using
WITH_TBB=ON BUILD_TBB=ON
and adding-ltbb
). - Changing optical-flow parameters as suggested here
Adding the relevant part of my code:
int opticalFlow(){
// capture from camera
VideoCapture cap(0);
if( !cap.isOpened() )
return -1;
// Set Resolution - The Default Resolution Is 640 x 480
cap.set(CV_CAP_PROP_FRAME_WIDTH,WIDTH_RES);
cap.set(CV_CAP_PROP_FRAME_HEIGHT,HEIGHT_RES);
Mat flow, cflow, undistortFrame, processedFrame, origFrame, croppedFrame;
UMat gray, prevgray, uflow;
currLocation.x = 0;
currLocation.y = 0;
// for each frame calculate optical flow
for(;;)
{
// take out frame- still distorted
cap >> origFrame;
// Convert to gray
cvtColor(origFrame, processedFrame, COLOR_BGR2GRAY);
// rotate image - perspective transformation
rotateImage(processedFrame, gray, eulerFromSensors.roll, eulerFromSensors.pitch, 0, 0, 0, 1, cameraMatrix.at<double>(0,0),
cameraMatrix.at<double>(0,2),cameraMatrix.at<double>(1,2));
if( !prevgray.empty() )
{
// calculate flow
calcOpticalFlowFarneback(prevgray, gray, uflow, 0.5, 3, 10, 3, 3, 1.2, 0);
uflow.copyTo(flow);
// get average
calcAvgOpticalFlow(flow, 16, corners);
/*
Some other calculations
.
.
.
Updating currLocation struct
*/
}
//break conditions
if(waitKey(1)>=0)
break;
if(end_run)
break;
std::swap(prevgray, gray);
}
return 0;
}
Notes:
- I've ran
callgrind
and the bottleneck is as expected thecalcOpticalFlowFarneback
function. - I checked the CPU cores load while running the program, and it is not using all 4 cores heavily, only one core is on 100% at a given time (even with TBB):
First, I want to say thanks for this answer below that I used in order to build my final solution that I will explain with as many details as I can.
My solution is divided into two parts:
Multithreading - Splitting each frame into 4 matrices, each quarter in a different matrix. Creating 4 threads and running each quarter processing in a different thread. I created the 4 quarters matrices such that there will be some (5%) overlap between them so that I won't lose the connecting between them (see figure below - yellow part is 55% from width and 55% from height).
Q1 = cv::UMat(gray, Range(0, HEIGHT_RES*0.55), Range(0, WIDTH_RES*0.55)); Q2 = cv::UMat(gray, Range(0, HEIGHT_RES*0.55), Range(WIDTH_RES*0.45, WIDTH_RES)); Q3 = cv::UMat(gray, Range(0.45*HEIGHT_RES, HEIGHT_RES), Range(0, WIDTH_RES*0.55)); Q4 = cv::UMat(gray, Range(0.45*HEIGHT_RES, HEIGHT_RES), Range(WIDTH_RES*0.45, WIDTH_RES));
Each thread is doing the optical flow processing (part 2 below) on a quarter and the main loop is waiting for all threads to finish in order to collect the results and averaging.
Using a sparse method - Using
calcOpticalFlowPyrLK
method within a selected ROI grid instead of usingcalcOpticalFlowFarneback
. Using Lucas-Kanade sparse method instead of the Farneback dense method is consuming much less CPU time. In my case I created a grid withgridstep=10
. This is the simple function for creating the grid:void createGrid(vector<cv::Point2f> &grid, int16_t wRes, int16_t hRes, int step){ for (int i= 0; i < wRes ; i+=step) for (int j= 0; j < hRes; j+=step) grid.push_back(cv::Point2f(i,j)); }
Note that if the grid is constant during the whole run, it is better to only create it once before entering the main loop.
After implementing both parts, when running the program, all 4 cores of the Odroid U3 were constantly working on 60%-80% and the performance were accelerated.
这篇关于加速OpticalFlow算法 - OpenCV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!