为什么Opencv GPU代码比CPU慢? [英] Why Opencv GPU code is slower than CPU?

查看:562
本文介绍了为什么Opencv GPU代码比CPU慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我通过笔记本使用opencv242 + VS2010。

我试图对OpenCV中的GPU块进行一些简单的测试,但它显示GPU比CPU代码慢100倍。
在这段代码中,我只是将彩色图像转为灰度图像,使用 cvtColor

I'm using opencv242 + VS2010 by a notebook.
I tried to do some simple test of the GPU block in OpenCV, but it showed the GPU is 100 times slower than CPU codes. In this code, I just turn the color image to grayscale image, use the function of cvtColor

PART1是CPU代码(测试cpu RGB2GRAY),PART2是上传图像到GPU,PART3是GPU RGB2GRAY,PART4是CPU RGB2GRAY。
有3件事情让我想知道:

Here is my code, PART1 is CPU code(test cpu RGB2GRAY), PART2 is upload image to GPU, PART3 is GPU RGB2GRAY, PART4 is CPU RGB2GRAY again. There are 3 things makes me so wondering:

1在我的代码中,part1为0.3ms,而part4(与part1完全相同)为40ms !

2上传图像到GPU的part2是6000ms!

3 Part3(GPU代码)是11ms,这个简单的图像太慢了!

1 In my code, part1 is 0.3ms, while part4 (which is exactly same with part1) is 40ms!!!
2 The part2 which upload image to GPU is 6000ms!!!
3 Part3( GPU codes) is 11ms, it is so slow for this simple image!

    #include "StdAfx.h"
    #include <iostream>
    #include "opencv2/opencv.hpp"
    #include "opencv2/gpu/gpu.hpp"
    #include "opencv2/gpu/gpumat.hpp"
    #include "opencv2/core/core.hpp"
    #include "opencv2/highgui/highgui.hpp"
    #include <cuda.h>
    #include <cuda_runtime_api.h>
    #include <ctime>
    #include <windows.h>

    using namespace std;
    using namespace cv;
    using namespace cv::gpu;

    int main()
    {
        LARGE_INTEGER freq;
        LONGLONG QPart1,QPart6;
        double dfMinus, dfFreq, dfTim;
        QueryPerformanceFrequency(&freq);
        dfFreq = (double)freq.QuadPart;

        cout<<getCudaEnabledDeviceCount()<<endl;
        Mat img_src = imread("d:\\CUDA\\train.png", 1);

        // PART1 CPU code~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        // From color image to grayscale image.
        QueryPerformanceCounter(&freq);
        QPart1 = freq.QuadPart;
        Mat img_gray;
        cvtColor(img_src,img_gray,CV_BGR2GRAY);
        QueryPerformanceCounter(&freq);
        QPart6 = freq.QuadPart;
        dfMinus = (double)(QPart6 - QPart1);
        dfTim = 1000 * dfMinus / dfFreq;
        printf("CPU RGB2GRAY running time is %.2f ms\n\n",dfTim);

        // PART2 GPU upload image~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        GpuMat gimg_src;
        QueryPerformanceCounter(&freq);
        QPart1 = freq.QuadPart;
        gimg_src.upload(img_src);
        QueryPerformanceCounter(&freq);
        QPart6 = freq.QuadPart;
        dfMinus = (double)(QPart6 - QPart1);
        dfTim = 1000 * dfMinus / dfFreq;
        printf("Read image running time is %.2f ms\n\n",dfTim);

        GpuMat dst1;
        QueryPerformanceCounter(&freq);
        QPart1 = freq.QuadPart;

        /*dst.upload(src_host);*/
        dst1.upload(imread("d:\\CUDA\\train.png", 1));

        QueryPerformanceCounter(&freq);
        QPart6 = freq.QuadPart;
        dfMinus = (double)(QPart6 - QPart1);
        dfTim = 1000 * dfMinus / dfFreq;
        printf("Read image running time 2 is %.2f ms\n\n",dfTim);

        // PART3~ GPU code~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        // gpuimage From color image to grayscale image.
        QueryPerformanceCounter(&freq);
        QPart1 = freq.QuadPart;

        GpuMat gimg_gray;
        gpu::cvtColor(gimg_src,gimg_gray,CV_BGR2GRAY);

        QueryPerformanceCounter(&freq);
        QPart6 = freq.QuadPart;
        dfMinus = (double)(QPart6 - QPart1);
        dfTim = 1000 * dfMinus / dfFreq;
        printf("GPU RGB2GRAY running time is %.2f ms\n\n",dfTim);

        // PART4~CPU code(again)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

        // gpuimage From color image to grayscale image.
        QueryPerformanceCounter(&freq);
        QPart1 = freq.QuadPart;
        Mat img_gray2;
        cvtColor(img_src,img_gray2,CV_BGR2GRAY);
        BOOL i_test=QueryPerformanceCounter(&freq);
        printf("%d \n",i_test);
        QPart6 = freq.QuadPart;
        dfMinus = (double)(QPart6 - QPart1);
        dfTim = 1000 * dfMinus / dfFreq;
        printf("CPU RGB2GRAY running time is %.2f ms\n\n",dfTim);

        cvWaitKey();
        getchar();
        return 0;
    }


推荐答案

cvtColor

CPU上的cvColor代码使用SSE2指令同时处理多达8个像素,如果你有TBB它使用所有的核心/超线程,CPU是以GPU的时钟速度的10倍运行,最后你不必将数据复制到GPU和回。

The cvColor code on the CPU is using SSE2 instructions to process upto 8 pixels at once and if you have TBB it's using all the cores/hyperthreads, the CPU is running at 10x the clock speed of the GPU and finally you don't have to copy data onto the GPU and back.

这篇关于为什么Opencv GPU代码比CPU慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆