使用 OpenCL 的 GPU 比 CPU 慢.为什么? [英] GPU with OpenCL is slower than CPU. Why?

查看:171
本文介绍了使用 OpenCL 的 GPU 比 CPU 慢.为什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

环境:

  • 英特尔 i7-9750H
  • 英特尔超高清显卡 630
  • Nvidia GTX1050(笔记本电脑)
  • Visual Studio 2019/C++
  • OpenCV 4.4
  • OpenCL 3.0(英特尔)/1.2(英伟达)

我正在尝试使用 OpenCL 来加速我的代码.但结果显示 CPU 比 GPU 快.我怎样才能加速我的代码?

I'm trying to use OpenCL to speed up my code. But the result shows CPU is faster than GPU. How could I speed up my code?

void GetHoughLines(cv::Mat dst) {
    cv::ocl::setUseOpenCL(true);

    int img_w = dst.size().width; // 5000
    int img_h = dst.size().height; // 4000

    cv::UMat tmp_dst = dst.getUMat(cv::ACCESS_READ);
    cv::UMat tmp_mat = cv::UMat(dst.size(), CV_8UC1, cv::Scalar(0));

    for (size_t i = 0; i < 1000; i++)
    {
        tmp_mat = tmp_mat.mul(tmp_dst);
    }
}

我只使用 CPU 时大约需要 3000 毫秒.当我使用 Intel UHD Graphics 630 时,它花了 3500 毫秒.而且我也试过 GTX1050,不过用了大概 3000ms.

It took about 3000ms when I used only CPU. When I used Intel UHD Graphics 630, it took 3500ms. And I also tried GTX1050, but it took about 3000ms.

请给我一些想法以加快速度.我应该让它至少 1000 毫秒.我应该使用 AMP 还是 OpenMP?但据我所知,它们只能计算简单的运算,不适用于 OpenCV 函数.

Please give me some ideas to speed it up. I should make it at least 1000ms. Should I use AMP or OpenMP? But as I know, they can only compute simple operations, not suitable for OpenCV functions.

推荐答案

基本上,您的代码很慢,因为 OpenCV 使用 OpenCL 的方式效率低下.与底层硬件无关.

Basically, Your code is slow because the way OpenCV uses OpenCL is inefficient. It has nothing to do with the underlying hardware.

为了使 OpenCL 代码(或任何与此相关的 GPU 相关代码)高效,主机端代码正确利用 GPU 至关重要.举几个原则:

In order for OpenCL code (or any GPU related code for that matter) to be efficient, it is crucial for the host side code to properly utilize the GPU. To name a few principles:

  • 通过异步将许多计算(内核)加入队列来使 GPU 饱和.
  • 避免不必要的同步.
  • 避免在主机 CPU 和 GPU 设备之间进行不必要的内存复制.
  • Saturate the GPU by asynchronously enqueuing many computations (kernels).
  • Avoid unnecessary synchronizations.
  • Avoid unnecessary memory copies between host CPU and GPU device.

即使您编写了最优化的 GPU 内核,但未能遵守这些基础知识,您也不太可能获得任何性能提升.

Even if you write the most optimized GPU kernels, but fail to adhere to these basics, you are very unlikely to gain any performance boosts.

OpenCV 代码库是如何遵守这些原则的一个很好的例子.

The OpenCV codebase is a great example of how not to adhere to these principles.

就您的示例而言,如果您重写代码以避免内存复制并显式使用设备内存,您可能会看到合理的性能:

As for your example, if you rewrite your code to avoid memory copies and use device memory explicitly, you might witness a reasonable performance:

auto frame1 = cv::UMat(size, format, cv::USAGE_ALLOCATE_DEVICE_MEMORY);
auto frame2 = cv::UMat(size, format, cv::USAGE_ALLOCATE_DEVICE_MEMORY);
auto frame3 = cv::UMat(size, format, cv::USAGE_ALLOCATE_DEVICE_MEMORY);

for (size_t i = 0; i < 10; i++)
{
    cv::multiply(frame1, frame2, frame3);
}

但无论如何,我建议您在不使用 OpenCV 的情况下学习使用 OpenCL API.

But in any case, I recommend you learn using the OpenCL API without OpenCV.

这篇关于使用 OpenCL 的 GPU 比 CPU 慢.为什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆