OpenCV CUDA运行速度比OpenCV CPU慢 [英] OpenCV CUDA running slower than OpenCV CPU

查看:1318
本文介绍了OpenCV CUDA运行速度比OpenCV CPU慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我从avi文件中读取视频时,我一直在努力获取OpenCV CUDA来提高诸如腐蚀/扩张,帧差异等功能的性能.通常,我在GPU(580gtx)上获得的FPS是CPU(AMD 955BE)的一半.在您问我是否正确测量fps之前,您可以肉眼清楚地看到GPU上的滞后,尤其是在使用较高的腐蚀/膨胀水平时.

I've been struggling to get OpenCV CUDA to improve performance for things like erode/dilate, frame differencing etc when i read in a video from an avi file. typical i get half the FPS on the GPU (580gtx) than on the CPU (AMD 955BE). Before u ask if i'm measuring fps correctly, you can clearly see the lag on the GPU with the naked eye especially when using a high erode/dilate level.

似乎我不是在并行阅读帧?这是代码:

It seems that i'm not reading in the frames in parallel?? Here is the code:

#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/video/tracking.hpp>
#include <opencv2/gpu/gpu.hpp>
#include <stdlib.h>
#include <stdio.h>

using namespace cv;
using namespace cv::gpu;

Mat cpuSrc;
GpuMat src, dst;

int element_shape = MORPH_RECT;

//the address of variable which receives trackbar position update
int max_iters = 10;
int open_close_pos = 0;
int erode_dilate_pos = 0;

// callback function for open/close trackbar
void OpenClose(int)
{
     IplImage disp;
     Mat temp;
    int n = open_close_pos - max_iters;
    int an = n > 0 ? n : -n;
    Mat element = getStructuringElement(element_shape, Size(an*2+1, an*2+1), Point(an, an) );
    if( n < 0 )
        cv::gpu::morphologyEx(src, dst, CV_MOP_OPEN, element);
    else
        cv::gpu::morphologyEx(src, dst, CV_MOP_CLOSE, element);

    dst.download(temp);
    disp = temp;    
   // cvShowImage("Open/Close",&disp);
}

// callback function for erode/dilate trackbar
void ErodeDilate(int)
{
     IplImage disp;
     Mat temp;
    int n = erode_dilate_pos - max_iters;
    int an = n > 0 ? n : -n;
    Mat element = getStructuringElement(element_shape, Size(an*2+1, an*2+1), Point(an, an) );
    if( n < 0 )
        cv::gpu::erode(src, dst, element);
    else
        cv::gpu::dilate(src, dst, element);
    dst.download(temp);
    disp = temp;    
    cvShowImage("Erode/Dilate",&disp);
}


int main( int argc, char** argv )
{

    VideoCapture capture("TwoManLoiter.avi");

    //create windows for output images
    namedWindow("Open/Close",1);
    namedWindow("Erode/Dilate",1);

    open_close_pos = 3;
    erode_dilate_pos = 0;
    createTrackbar("iterations", "Open/Close",&open_close_pos,max_iters*2+1,NULL);
    createTrackbar("iterations", "Erode/Dilate",&erode_dilate_pos,max_iters*2+1,NULL);

    for(;;)
    {

         capture >> cpuSrc;
         src.upload(cpuSrc);
         GpuMat grey;
         cv::gpu::cvtColor(src, grey, CV_BGR2GRAY); 
         src = grey;

        int c;

        ErodeDilate(erode_dilate_pos);
        c = cvWaitKey(25);

        if( (char)c == 27 )
            break;

    }

    return 0;
}

当然,使用命名空间cv :: gpu和Mat而不是GpuMat的CPU实现是相同的.

The CPU implementation is the same minus using namespace cv::gpu and the Mat instead of GpuMat of course.

谢谢

推荐答案

我的猜测是,GPU侵蚀/膨胀带来的性能增益会因每帧将图像往返于GPU传输的内存操作而过重.请记住,内存带宽是GPGPU算法的关键因素,甚至是CPU和GPU之间的带宽.

My guess would be, that the performance gain from the GPU erode/dilate is overweighted by the memory operations of transferring the image to and from the GPU every frame. Keep in mind that memory bandwidth is a crucial factor in GPGPU algorithms, and even more the bandwidth between CPU and GPU.

要对其进行优化,您可以编写自己的图像显示例程(而不是cvShowImage),该例程使用OpenGL并仅将图像显示为OpenGL纹理.在这种情况下,您不需要将处理后的图像从GPU读回到CPU,并且可以直接使用OpenGL纹理/缓冲区作为CUDA图像/缓冲区,因此您甚至不需要在GPU内复制图像.但是在这种情况下,您可能必须自己管理CUDA资源.通过这种方法,您还可以使用PBO将视频上传到纹理中,并从异步中获益一些.

To optimize it you might write your own image display routine (instead of cvShowImage) that uses OpenGL and just displays the image as an OpenGL texture. In this case you don't need to read the processed image from the GPU back to CPU and you can directly use an OpenGL texture/buffer as a CUDA image/buffer, so you don't even need to copy the image inside the GPU. But in this case you might have to manage CUDA resources yourself. With this method you might also use PBOs to upload the video into the texture and profit a bit from asynchronity.

这篇关于OpenCV CUDA运行速度比OpenCV CPU慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆