打开MP处理时间 [英] Open MP process time

查看:74
本文介绍了打开MP处理时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好!
我正在使用打开mp的简单图像处理程序工作,
但无法使程序达到预期的速度.

波纹管的处理时间为0.5秒.
但是注释掉了mp开头的行,它只有0.1秒.
我看不到发生了什么事.
请告诉我我想念的东西.

在此先感谢!
-carlos

[机器信息]
cpu:英特尔酷睿2 Quad 2.6GHz
内存:4gb
os:win xp professional

[代码]

hi everyone!
i''m working with a simple image process program using open mp,
but can''t get make my program faster as it''s supposed to be.

the process time of the code bellow was 0.5secs.
but with the open mp line commented out, it was only 0.1secs.
i can''t see what is going on.
please tell me what i''m missing.

thanks in advance!
-carlos

[machine info]
cpu:intel core 2 quad 2.6ghz
memory:4gb
os:win xp professional

[code]

int width = 10000;
int length = 10000;
#pragma omp parallel for  // open mp line
for (int y = 0; y < length; y++) {
  int y_offset = y * width;
  const byte* source = SOURCE_IMAGE_ADDRESS_;
  source += y_offset;
  byte* destination = DESTINATION_IMAGE_ADDRESS_;
  destination += y_offset;
  for (int x = 0; x < width; x++) {
    *destination = (*source);
    source++;
    destination++;  
  }
}

推荐答案

您的性能提高了5倍吗?如我所料,这是更好的结果.如果您只有2个核心,您怎么能期望更多?您是否认为并行处理是奇迹,可以从无处获得功率? :-)

—SA
Did you gain 5 times in performance? This is better result as I would expect. How could you expect more if you only have 2 cores? Do you think parallel processing is the miracle, can draw power from nowhere? :-)

—SA


默认情况下,内部循环变量x是共享的.您应该将其设为私有:
The inner loop var x is shared by default. You should make it private:
#pragma omp parallel for private(x)


这样可以提高性能.但是我不确定是否全部.


This should boost performance. But I''m not sure if this is all.


使用打开mp的简单图像处理程序

那是第一个错误:多线程从未如此简单!

第二个问题是内存访问:我不知道OMP有多聪明,但是我怀疑它会认识到,对于每个y,您都访问特定范围内存中的值,并且该内存不会与内存重叠.用于其他值的范围,更重要的是,该范围与您写入的内存范围不重叠!这意味着每个内存访问都必须同步,因此无论您是否使用OMP,大多数代码实际上都是串行的.更糟糕的是:同步可能需要比实际访问更长的时间!

我不知道OMP,但是您必须以某种方式告诉它不要同步内存访问.通过将内存范围指定为本地(或解决方案2中使用的私有"?),或者通过其他选项.

还有更多,e. G.每个内核将尝试将其访问的内存加载到缓存中.如果两个(或更多)访问存储器地址重叠,则每次发生时都需要同步该缓存.您可能没有意识到,但是如果源和目标的内存地址彼此接近,则内核可能会尝试加载包含两个地址的整个内存块,因此缓存确实会重叠...

正如我所说:多线程从来都不是简单的!
simple image process program using open mp

That is the first mistake: Multithreading is never simple!

The second problem is memory access: I don''t know how clever OMP goes about it, but I doubt it will recognize that for each y you access values from a very specific range of memory, and that this memory does not overlap with the ranges used for other values, and, more importantly, that this range does not overlap with the memory range you write to! That means every single memory access will have to be synchronized, and therefore the majority of your code is in effect serial, no matter whether you use OMP or not. Worse: the synchronization probably takes longer than the actual access!

I don''t know OMP, but you must somehow tell it to not synchronize the memory accesses. Either by specifying the memory ranges as local (or ''private'' as used in solution 2?), or maybe through other options.

There''s more to it, e. g. that each core will try to load the memory it accesses into it''s cache. If two (or more) access memory addresses that overlap, that cache needs to be synchronized everytime this happens. You may not be aware of that, but if the memory address of your source and destination are close to each other, the cores may try to load the entire memory block that includes both addresses, and thus the cache would indeed overlap...

As I said: Multithreading is never simple!


这篇关于打开MP处理时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆