OpenCV GPU Farneback Optical Flow在多线程中表现不佳 [英] OpenCV GPU Farneback Optical Flow badly works in multi-threading

查看:185
本文介绍了OpenCV GPU Farneback Optical Flow在多线程中表现不佳的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的应用程序使用Opencv gpu类gpu::FarnebackOpticalFlow来计算输入视频的一对连续帧之间的光流.为了加快该过程,我利用了OpenCV的TBB支持在多线程中运行该方法.但是,多线程性能不像单线程性能.只是为了让您了解不同的行为,这里有两个快照,分别是单线程和多线程实现的.

My application uses the Opencv gpu class gpu::FarnebackOpticalFlow to compute the optical flow between a pair of consecutive frames of an input video. In order to speed-up the process, I exploited the TBB support of OpenCV to run the method in multi-threading. However, the multi-threading performance does not behave like the single-threaded one. Just to give you an idea of the different behaviour, here are two snapshots, respectively of the single threaded and the multi threaded implementation.

多线程实现假定将图像分成8个不同的条带(我的PC上的内核数),并且将光流的Farneback实现的gpu方法应用于每个图像.这是这两种方法的对应代码行:

The multi-threaded implementation assumes to split the image in 8 different stripes (the number of cores on my pc), and the gpu method for the Farneback implementation of the optical flow is applied on each of them. Here are the corresponding code lines for both methods:

单线程实现

/* main.cpp */
//prevImg and img are the input Mat images extracted from the input video
...
GpuMat gpuImg8U(img);
GpuMat gpuPrevImg8U(prevImg);   
GpuMat u_flow, v_flow;
gpu::FarnebackOpticalFlow farneback_flow;
farneback_flow.numLevels = maxLayer;
farneback_flow.pyrScale = 0.5;
farneback_flow.winSize = windows_size;
farneback_flow.numIters = of_iterations;
farneback_flow(gpuPrevImg8U,gpuImg8U,u_flow,v_flow);
getFlowField(Mat(u_flow),Mat(v_flow),optical_flow);

...
}

void getFlowField(const Mat& u, const Mat& v, Mat& flowField){    
    for (int i = 0; i < flowField.rows; ++i){
        const float* ptr_u = u.ptr<float>(i);
        const float* ptr_v = v.ptr<float>(i);
        Point2f* row = flowField.ptr<Point2f>(i);

        for (int j = 0; j < flowField.cols; ++j){
            row[j].y = ptr_v[j];
            row[j].x = ptr_u[j];
        }
    }
}

多线程实施

/* parallel.h */
class ParallelOpticalFlow : public cv::ParallelLoopBody {

    private:
        int coreNum;
        cv::gpu::GpuMat img, img2;
        cv::gpu::FarnebackOpticalFlow& farneback_flow;
        const cv::gpu::GpuMat u_flow, v_flow;
        cv::Mat& optical_flow;

    public:
        ParallelOpticalFlow(int cores, cv::gpu::FarnebackOpticalFlow& flowHandler, cv::gpu::GpuMat img_, cv::gpu::GpuMat img2_, const cv::gpu::GpuMat u, const cv::gpu::GpuMat v, cv::Mat& of)
                    : coreNum(cores), farneback_flow(flowHandler), img(img_), img2(img2_), u_flow(u), v_flow(v), optical_flow(of){}

        virtual void operator()(const cv::Range& range) const;

};


/* parallel.cpp*/
void ParallelOpticalFlow::operator()(const cv::Range& range) const {

    for (int k = range.start ; k < range.end ; k ++){

        cv::gpu::GpuMat img_rect(img,cv::Rect(0,img.rows/coreNum*k,img.cols,img.rows/coreNum));
        cv::gpu::GpuMat img2_rect(img2,cv::Rect(0,img2.rows/coreNum*k,img2.cols,img2.rows/coreNum));
        cv::gpu::GpuMat u_rect(u_flow,cv::Rect(0,u_flow.rows/coreNum*k,u_flow.cols,u_flow.rows/coreNum));
        cv::gpu::GpuMat v_rect(v_flow,cv::Rect(0,v_flow.rows/coreNum*k,v_flow.cols,v_flow.rows/coreNum));
        cv::Mat of_rect(optical_flow,cv::Rect(0,optical_flow.rows/coreNum*k,optical_flow.cols,optical_flow.rows/coreNum));

        farneback_flow(img_rect,img2_rect,u_rect,v_rect);
        getFlowField(Mat(u_rect),Mat(v_rect),of_rect);
    }
}

/* main.cpp */

    parallel_for_(Range(0,cores_num),ParallelOpticalFlow(cores_num,farneback_flow,gpuPrevImg8U,gpuImg8U,u_flow,v_flow,optical_flow));

在两种情况下,代码看起来等效.谁能解释我为什么会有这些不同的行为?还是我的代码中有一些错误? 预先感谢您的回答

The codes look like equivalent in the two cases. Can anyone explain me why there are these different behaviours? Or if there are some mistakes in my code? Thanks in advance for your answers

推荐答案

GPU模块不是线程安全的.它使用一些全局变量,例如__constant__内存和纹理参考API,如果在多线程环境中使用它们会导致数据争用.

GPU module is not thread-safe. It uses some global variables, like __constant__ memory and texture reference API, which can lead to data race if used in multi-threaded environment.

这篇关于OpenCV GPU Farneback Optical Flow在多线程中表现不佳的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆