C ++中的多线程图像处理 [英] Multithreaded image processing in C++

查看:176
本文介绍了C ++中的多线程图像处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理一个程序,操纵不同大小的图像。这些操作中的许多操作从输入读取像素数据并写入单独的输出(例如模糊)。这是在每个像素的基础上完成的。



这样的映像映射在CPU上非常紧张。我想使用多线程来加快速度。我该怎么做?我想在每行像素创建一个线程。



我有几个要求:




  • 可执行大小必须最小化。换句话说,我不能使用大规模的库。什么是C / C ++最轻量级的可移植线程库?

  • 必须最小化可执行文件大小。我在想一个函数forEachRow(fp *),它为每一行运行一个线程,甚至一个forEachPixel(fp *),其中fp操作在自己的线程中的单个像素。哪个是最好的?

    • 我应该使用正常函数还是functor或functionoid或某些lambda函数或其他?

    • 某些操作使用优化需要来自先前处理的像素的信息。这使forEachRow有利。


  • 我需要锁定只读和只写数组吗?

    • 只读取输入,但许多操作需要来自数组中多个像素的输入。

    • 输出


  • 速度也很重要(当然),但优化可执行文件大小优先。



感谢。



有关此主题的更多信息, http://stackoverflow.com/questions/615264/c-parallelization-libraries-openmp-vs-thread-building-blocks\">C++并行库:OpenMP与线程构建模块

解决方案

如果您的编译器支持 OpenMP 我知道 VC ++ 8.0和9.0 做的,和gcc一样),它可以使这样的事情更容易做到。



你不只是想做很多线程 - 有一个点递减的回报,添加新线程减慢了事情,因为你开始越来越多的上下文切换。在某些时候,使用太多的线程实际上可能使并行版本比使用线性算法慢。最佳线程数是可用的cpus /内核数量和每个线程在I / O等事务上阻塞的时间百分比的函数。请参阅Herb Sutter的这篇文章,了解并行性能提升的一些讨论。



OpenMP可让您轻松地根据可用的CPU数量调整创建的线程数。使用它(特别是在数据处理的情况下)通常涉及简单地在现有代码中插入一些 #pragma omp ,并让编译器处理创建线程和同步。 p>

一般来说 - 只要数据没有改变,就不必锁定只读数据。如果你能确定每个像素插槽只会被写一次,你可以保证所有的写入都已经完成,然后开始从结果中读取,你不必锁定。



对于OpenMP,对于函子/函数对象,不需要做任何特殊的操作。写出它对你最有意义的方式。以下是 Intel (将rgb转换为灰度)的图片处理示例: / p>

  #pragma omp parallel for 
for(i = 0; i
pGrayScaleBitmap [i] =(unsigned BYTE)
(pRGBBitmap [i] .red * 0.299 +
pRGBBitmap [i] .green * 0.587 +
pRGBBitmap [i] .blue * 0.114);
}

这会自动分成与CPU相同数量的线程,每个线程的数组部分。


I am working on a program which manipulates images of different sizes. Many of these manipulations read pixel data from an input and write to a separate output (e.g. blur). This is done on a per-pixel basis.

Such image mapulations are very stressful on the CPU. I would like to use multithreading to speed things up. How would I do this? I was thinking of creating one thread per row of pixels.

I have several requirements:

  • Executable size must be minimized. In other words, I can't use massive libraries. What's the most light-weight, portable threading library for C/C++?
  • Executable size must be minimized. I was thinking of having a function forEachRow(fp* ) which runs a thread for each row, or even a forEachPixel(fp* ) where fp operates on a single pixel in its own thread. Which is best?
    • Should I use normal functions or functors or functionoids or some lambda functions or ... something else?
    • Some operations use optimizations which require information from the previous pixel processed. This makes forEachRow favorable. Would using forEachPixel be better even considering this?
  • Would I need to lock my read-only and write-only arrays?
    • The input is only read from, but many operations require input from more than one pixel in the array.
    • The ouput is only written once per pixel.
  • Speed is also important (of course), but optimize executable size takes precedence.

Thanks.

More information on this topic for the curious: C++ Parallelization Libraries: OpenMP vs. Thread Building Blocks

解决方案

If your compiler supports OpenMP (I know VC++ 8.0 and 9.0 do, as does gcc), it can make things like this much easier to do.

You don't just want to make a lot of threads - there's a point of diminishing returns where adding new threads slows things down as you start getting more and more context switches. At some point, using too many threads can actually make the parallel version slower than just using a linear algorithm. The optimal number of threads is a function of the number of cpus/cores available, and the percentage of time each thread spends blocked on things like I/O. Take a look at this article by Herb Sutter for some discussion on parallel performance gains.

OpenMP lets you easily adapt the number of threads created to the number of CPUs available. Using it (especially in data-processing cases) often involves simply putting in a few #pragma omps in existing code, and letting the compiler handle creating threads and synchronization.

In general - as long as data isn't changing, you won't have to lock read-only data. If you can be sure that each pixel slot will only be written once and you can guarantee that all the writing has been completed before you start reading from the result, you won't have to lock that either.

For OpenMP, there's no need to do anything special as far as functors / function objects. Write it whichever way makes the most sense to you. Here's an image-processing example from Intel (converts rgb to grayscale):

#pragma omp parallel for
for (i=0; i < numPixels; i++)
{
   pGrayScaleBitmap[i] = (unsigned BYTE)
       (pRGBBitmap[i].red * 0.299 +
        pRGBBitmap[i].green * 0.587 +
        pRGBBitmap[i].blue * 0.114);
}

This automatically splits up into as many threads as you have CPUs, and assigns a section of the array to each thread.

这篇关于C ++中的多线程图像处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆