GPU版本的OpenCV算法比我的机器上的CPU版本慢? [英] GPU versions of OpenCV algorithms slower than CPU versions on my machine?

查看:329
本文介绍了GPU版本的OpenCV算法比我的机器上的CPU版本慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在尝试使用OpenCV GPU加速一个简单的算法时,我注意到在我的机器(Ubuntu 12.10,NVidia 9800GT,Cuda 4.2.9,g ++ 4.7.2)上,GPU版本实际上比CPU版本慢。我使用以下代码测试。

While trying to speed up a simple algorithm using the GPU with OpenCV, I noticed that on my machine (Ubuntu 12.10, NVidia 9800GT, Cuda 4.2.9, g++ 4.7.2) the GPU Version is actually slower than the CPU version. I tested with the following code.

#include <opencv2/opencv.hpp>
#include <opencv2/gpu/gpu.hpp>

#include <chrono>
#include <iostream>

int main()
{
    using namespace cv;
    using namespace std;

    Mat img1(512, 512, CV_32FC3, Scalar(0.1f, 0.2f, 0.3f));
    Mat img2(128, 128, CV_32FC3, Scalar(0.2f, 0.3f, 0.4f));
    Mat img3(128, 128, CV_32FC3, Scalar(0.3f, 0.4f, 0.5f));

    auto startCPU = chrono::high_resolution_clock::now();
    double resultCPU(0.0);
    cout << "CPU ... " << flush;
    for (int y(0); y < img2.rows; ++y)
    {
        for (int x(0); x < img2.cols; ++x)
        {
            Mat roi(img1(Rect(x, y, img2.cols, img2.rows)));
            Mat diff;
            absdiff(roi, img2, diff);
            Mat diffMult(diff.mul(img3));
            Scalar diffSum(sum(diff));
            double diffVal(diffSum[0] + diffSum[1] + diffSum[2]);
            resultCPU += diffVal;
        }
    }
    auto endCPU = chrono::high_resolution_clock::now();
    auto elapsedCPU = endCPU - startCPU;
    cout << "done. " << resultCPU << " - ticks: " << elapsedCPU.count() << endl;

    gpu::GpuMat img1GPU(img1);
    gpu::GpuMat img2GPU(img2);
    gpu::GpuMat img3GPU(img3);
    gpu::GpuMat diffGPU;
    gpu::GpuMat diffMultGPU;
    gpu::GpuMat sumBuf;

    double resultGPU(0.0);
    auto startGPU = chrono::high_resolution_clock::now();
    cout << "GPU ... " << flush;
    for (int y(0); y < img2GPU.rows; ++y)
    {
        for (int x(0); x < img2GPU.cols; ++x)
        {
            gpu::GpuMat roiGPU(img1GPU, Rect(x, y, img2GPU.cols, img2GPU.rows));
            gpu::absdiff(roiGPU, img2GPU, diffGPU);
            gpu::multiply(diffGPU, img3GPU, diffMultGPU);
            Scalar diffSum(gpu::sum(diffMultGPU, sumBuf));
            double diffVal(diffSum[0] + diffSum[1] + diffSum[2]);
            resultGPU += diffVal;
        }
    }
    auto endGPU = chrono::high_resolution_clock::now();
    auto elapsedGPU = endGPU - startGPU;
    cout << "done. " << resultGPU << " - ticks: " << elapsedGPU.count() << endl;
}

我的结果如下:

CPU ... done. 8.05306e+07 - ticks: 4028470
GPU ... done. 3.22122e+07 - ticks: 5459935

如果这有助于:我的性能分析告诉我大部分时间是花在 cudaDeviceSynchronize

If this helps: My profiler (System Profiler 1.1.8) tells me that most of the time is spend in cudaDeviceSynchronize.

我做错了什么根本的方式使用OpenCV GPU功能还是我的GPU只是慢?

Am I doing wrong something fundamental with the way I use the OpenCV GPU functions or is my GPU just slow?

推荐答案

感谢hubs和Eric的意见,我的测试在一个方式,GPU版本实际上变得比CPU版本更快。导致两个版本的不同校验和的错误现在也被消除。 ; - )

Thanks to the comments of hubs and Eric I was able to change my test in a way that the GPU version actually became faster than the CPU version. The mistake leading to the different checksums of both versions is now also eliminated. ;-)

#include <opencv2/opencv.hpp>
#include <opencv2/gpu/gpu.hpp>

#include <chrono>
#include <iostream>

int main()
{
    using namespace cv;
    using namespace std;

    Mat img1(512, 512, CV_32FC3, Scalar(1.0f, 2.0f, 3.0f));
    Mat img2(128, 128, CV_32FC3, Scalar(4.0f, 5.0f, 6.0f));
    Mat img3(128, 128, CV_32FC3, Scalar(7.0f, 8.0f, 9.0f));
    Mat resultCPU(img2.rows, img2.cols, CV_32FC3, Scalar(0.0f, 0.0f, 0.0f));

    auto startCPU = chrono::high_resolution_clock::now();
    cout << "CPU ... " << flush;
    for (int y(0); y < img1.rows - img2.rows; ++y)
    {
        for (int x(0); x < img1.cols - img2.cols; ++x)
        {
            Mat roi(img1(Rect(x, y, img2.cols, img2.rows)));
            Mat diff;
            absdiff(roi, img2, diff);
            Mat diffMult(diff.mul(img3));
            resultCPU += diffMult;
        }
    }
    auto endCPU = chrono::high_resolution_clock::now();
    auto elapsedCPU = endCPU - startCPU;
    Scalar meanCPU(mean(resultCPU));
    cout << "done. " << meanCPU << " - ticks: " << elapsedCPU.count() << endl;

    gpu::GpuMat img1GPU(img1);
    gpu::GpuMat img2GPU(img2);
    gpu::GpuMat img3GPU(img3);
    gpu::GpuMat diffGPU(img2.rows, img2.cols, CV_32FC3);
    gpu::GpuMat diffMultGPU(img2.rows, img2.cols, CV_32FC3);
    gpu::GpuMat resultGPU(img2.rows, img2.cols, CV_32FC3, Scalar(0.0f, 0.0f, 0.0f));

    auto startGPU = chrono::high_resolution_clock::now();
    cout << "GPU ... " << flush;
    for (int y(0); y < img1GPU.rows - img2GPU.rows; ++y)
    {
        for (int x(0); x < img1GPU.cols - img2GPU.cols; ++x)
        {
            gpu::GpuMat roiGPU(img1GPU, Rect(x, y, img2GPU.cols, img2GPU.rows));
            gpu::absdiff(roiGPU, img2GPU, diffGPU);
            gpu::multiply(diffGPU, img3GPU, diffMultGPU);
            gpu::add(resultGPU, diffMultGPU, resultGPU);
        }
    }
    auto endGPU = chrono::high_resolution_clock::now();
    auto elapsedGPU = endGPU - startGPU;
    Mat downloadedResultGPU(resultGPU);
    Scalar meanGPU(mean(downloadedResultGPU));
    cout << "done. " << meanGPU << " - ticks: " << elapsedGPU.count() << endl;
}

输出:

CPU ... done. [3.09658e+06, 3.53894e+06, 3.98131e+06, 0] - ticks: 34021332
GPU ... done. [3.09658e+06, 3.53894e+06, 3.98131e+06, 0] - ticks: 20609880

这不是我预期的加速,但可能我的GPU不是这个东西的最好的。谢谢你们。

That is not the speedup I expected, but probably my GPU is just not the best for this stuff. Thanks guys.

这篇关于GPU版本的OpenCV算法比我的机器上的CPU版本慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆