我该如何应对OpenMP的数据竞争? [英] How do I deal with a data race in OpenMP?

查看:531
本文介绍了我该如何应对OpenMP的数据竞争?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用的OpenMP数组中的加号。以下是我的code:

I am trying to use OpenMP to add the numbers in an array. The following is my code:

  int* input = (int*) malloc (sizeof(int)*snum);
  int sum = 0;
  int i;
  for(i=0;i<snum;i++){
      input[i] = i+1;
  }
  #pragma omp parallel for schedule(static)
  for(i=0;i<snum;i++)
  {
      int* tmpsum = input+i;
 sum += *tmpsum;
  }

这不会产生对正确的结果。怎么了?

This does not produce the right result for sum. What's wrong?

推荐答案

您code目前有一个竞争条件时,这就是为什么结果不正确。为了说明这是为什么,让我们用一个简单的例子:

Your code currently has a race condition, which is why the result is incorrect. To illustrate why this is, let's use a simple example:

您是在2个线程运行,并且阵列 INT输入[4] = {1,2,3,4}; 。初始化 0 正确,并准备开始循环。在循环的第一次迭代,线程0和线程1读从内存为 0 ,然后添加自己的各元素,并将它写回内存。但是,这意味着线程0试图写入和= 1 内存(第一个元素是 1 ,和和= 0 + 1 = 1 ),而线程1试图写入和= 2 来记忆(第二元素是 2 和= 0 + 2 = 2 )。这code的最终结果取决于哪个线程最后完成,因此写入到内存最后,这是一个竞争条件。不仅如此,但在这种特殊情况下,既没有在code可能产生问题的答案是正确的!有几种方法来解决这个问题;我将在下面详细三个基本的​​:

You are running on 2 threads and the array is int input[4] = {1, 2, 3, 4};. You initialize sum to 0 correctly and are ready to start the loop. In the first iteration of your loop, thread 0 and thread 1 read sum from memory as 0, and then add their respective element to sum, and write it back to memory. However, this means that thread 0 is trying to write sum = 1 to memory (the first element is 1, and sum = 0 + 1 = 1), while thread 1 is trying to write sum = 2 to memory (the second element is 2, and sum = 0 + 2 = 2). The end result of this code depends on which one of the threads finishes last, and therefore writes to memory last, which is a race condition. Not only that, but in this particular case, neither of the answers that the code could produce are correct! There are several ways to get around this; I'll detail three basic ones below:

的#pragma OMP关键

#pragma omp critical:

在OpenMP中,有所谓的关键指令。这限制了code,以便只有一个线程可以在同一时间做一些事情。例如,你的 -loop可以这样写:

In OpenMP, there is what is called a critical directive. This restricts the code so that only one thread can do something at a time. For example, your for-loop can be written:

#pragma omp parallel for schedule(static)
for(i = 0; i < snum; i++) {
    int *tmpsum = input + i;
#pragma omp critical
    sum += *tmpsum;
}

这消除了竞争条件,因为只有一个线程访问,一次写入。但是,关键指令是很对的表现非常糟糕,并可能会杀死大部分(如果不是全部)你在第一时间使用OpenMP获得的收益。

This eliminates the race condition as only one thread accesses and writes to sum at a time. However, the critical directive is very very bad for performance, and will likely kill a large portion (if not all) of the gains you get from using OpenMP in the first place.

的#pragma OMP ATOMIC

#pragma omp atomic:

原子指令是非常相似的关键指令。主要的区别是,虽然关键指令适用于任何你想在同一时间做一个线程,在原子指令仅应用于存储器读/写操作。由于所有我们正在做这code例子是读取和写入总结,这个指令将很好地工作:

The atomic directive is very similar to the critical directive. The major difference is that, while the critical directive applies to anything that you would like to do one thread at a time, the atomic directive only applies to memory read/write operations. As all we are doing in this code example is reading and writing to sum, this directive will work perfectly:

#pragma omp parallel for schedule(static)
for(i = 0; i < snum; i++) {
    int *tmpsum = input + i;
#pragma omp atomic
    sum += *tmpsum;
}

的性能原子通常比的关键显著更好。但是,它仍然不是你的具体情况最好的选择。

The performance of atomic is generally significantly better than that of critical. However, it is still not the best option in your particular case.

减少

reduction:

这已经建议其他人,你应该使用的方法,该方法是减少。您可以通过更改做到这一点的 -loop为:

The method you should use, and the method that has already been suggested by others, is reduction. You can do this by changing the for-loop to:

#pragma omp parallel for schedule(static) reduction(+:sum)
for(i = 0; i < snum; i++) {
    int *tmpsum = input + i;
    sum += *tmpsum;
}

减少命令告诉,循环运行时,你希望每个线程来跟踪自己的总和的OpenMP 变量,并在循环的末尾添加他们都放弃了。这是最有效的方法为您的整个循环,现在并行运行,唯一的开销就在循环结束,是当每个线程的值需要被相加。

The reduction command tells OpenMP that, while the loop is running, you want each thread to keep track of its own sum variable, and add them all up at the end of the loop. This is the most efficient method as your entire loop now runs in parallel, with the only overhead being right at the end of the loop, when the sum values of each of the threads need to be added up.

这篇关于我该如何应对OpenMP的数据竞争?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆