如何提高OpenMP代码的性能? [英] How can I improve the perfomance of my OpenMP code?

查看:262
本文介绍了如何提高OpenMP代码的性能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在尝试提高我的代码的并行性能,但我仍然对OpenMP陌生.我必须遍历一个大容器,在每次迭代中都从多个条目读取并将结果写入单个条目.下面是我要执行的代码的最小示例.

I am currently trying to improve parallel performance on my Code and I am still new to OpenMP. I have to iterate over a large container, in each iteration reading from multiple entries and writing a result to a single entry. Below is a very minmal Code example of what I am trying to do.

data是指向存储大量数据点的数组的指针.在并行区域之前,我创建了一个数组newData,因此可以将data用作只读,将newData用作只读,然后我将旧的data扔掉,并使用newData进行进一步的计算. 据我了解,datanewData在线程之间共享,并且在并行区域内声明的所有内容都是私有的. 多个线程从data读取是否会导致性能问题?

data is a pointer to an array, where a lot of datapoints are stored. Before the parallel region I create an Array newData, so can use data as read-only and newData as write-only, afterwards I throw the old data away and use newDatafor further calculations. To my understanding data and newDataare shared between threads and everything declared inside the parallel region is private. Can reading from databy multiple threads cause performance issues?

我正在使用#criticalnewData的元素分配新值,以避免出现竞争情况.这是必要的,因为我只能一次访问newData的每个元素,而不能访问多个线程吗?

I am using #critical for assigning a new value to an element of newData to avoid race conditions. Is this necessary, since I access every element of newDataonly once and never by multiple threads?

我也不确定安排时间.我是否需要指定staticdynamic时间表?因为所有线程彼此独立,我可以使用nowait吗?

Also I am not sure about scheduling. Do I have to specify if I want a static or dynamic schedule? Can I use nowait since all threads are idependent of each other?

array *newData = new array;

omp_set_num_threads (threads);

#pragma omp parallel
{
    #pragma omp for
    for (int i = 0;  i < range; i++)
    {
        double middle = (*data)[i];
        double previous = (*data)[i-1];
        double next = (*data)[i+1];

        double new_value = (previous + middle + next) / 3.0;
        #pragma omp critical(assignment)
        (*newData)[i] = new_value;
    }
}

delete data;
data = newData;

我知道在第一次和最后一次迭代中previousnext不能从data中读取,在实际代码中,这已得到解决,但是对于这个最小的示例,您可以读取多个从data开始的时间.

I am aware that in the first and last iteration previous and next can not be read from data, in the real code this is taken care of but for this minimal example you get the idea of reading multiple times from data.

推荐答案

  1. 由多个线程读取一个数组通常没有害处.
  2. 如果多个线程在完全相同的数据上工作,则只需要一个关键部分,这里每个线程访问数组的不同部分,因此您不需要它.关键部分的性能很差,因此仅在绝对必要时才使用它们.通常,它们可以被原子动作代替: openMP,原子还是关键? 就像关键部分一样,如果每个线程访问不同的数据,它们就没有意义.
  3. 对于调度程序而言,最好对每个程序进行测试并评估性能,因为有关性能的预测通常是错误的.还要尝试不同的块大小.
  4. 其他一些可能会有所帮助的事情:
    • 测量性能通常会受到PC上其他任务的干扰,因此请进行多次测量并进行最小值测量(除非每次输入不同,然后取平均值并进行更多测量).
    • 您真的需要双精度吗?浮动更快.
  1. Reading an array by multiple threads usually does no harm.
  2. You only need a critical section if multiple threads work on the exact same piece of data, here each thread accesses a different part of the array so you dont need it. Critical sections are very bad for performance so only use them if absolutely necessary. Often they can be replaced by atomic actions: openMP, atomic vs critical? Like a critical section, they dont make sense if each thread accesses different data.
  3. For the scheduler its best to test them each and measure the performance as predictions about performance are often wrong. Also try different chunk sizes.
  4. Some other things that might help:
    • Measuring performance is often interferred by other tasks on your pc so take multiple measurements and take their minimum (except if the input is different each time, then take the average and do more measurements).
    • Do you really need double precision? Floats are a lot faster.

这篇关于如何提高OpenMP代码的性能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆