在 numba 并行化中理解这种竞争条件 [英] understanding this race condition in numba parallelization

查看:41
本文介绍了在 numba 并行化中理解这种竞争条件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Numba 文档中有一个关于并行竞争条件的例子

There is an example in Numba doc about parallel race condition

import numba as nb
import numpy as np
@nb.njit(parallel=True)
def prange_wrong_result(x):
    n = x.shape[0]
    y = np.zeros(4)
    for i in nb.prange(n):
        y[:]+= x[i]
    return y

我已经运行过了,确实输出了类似

I have ran it, and it indeed outputs abnormal result like

prange_wrong_result(np.ones(10000))
#array([5264., 5273., 5231., 5234.])

然后我尝试将循环更改为

then I tried to change the loop into

import numba as nb
import numpy as np
@nb.njit(parallel=True)
def prange_wrong_result(x):
    n = x.shape[0]
    y = np.zeros(4)
    for i in nb.prange(n):
        y+= x[i]
    return y

它输出

prange_wrong_result(np.ones(10000))
#array([10000., 10000., 10000., 10000.])

我已经阅读了一些竞争条件解释.但是还是没看懂

I have read some race condition explanation. But I still don't understand

  1. 为什么第二个例子没有赛车条件?y[:]=y=
  2. 有什么区别
  3. 为什么第一个示例中四个元素的输出不同?

推荐答案

在您的第一个示例中,您有多个线程/进程共享同一个数组并读取并分配给共享数组.y[:] += x[i] 大致相当于:

In your first example you have multiple threads/processes that share the same array and read + assign to the shared array. The y[:] += x[i] is roughly equivalent to:

y[0] += x[i]
y[1] += x[i]
y[2] += x[i]
y[3] += x[i]

实际上 += 只是读取、加法和赋值操作的语法糖,所以 y[0] += x[i] 实际上是:

In fact the += is just syntactic sugar for a read, addition, and assignment operation, so y[0] += x[i] is in fact:

_value = y[0]
_value = _value + x[i]
y[0] = _value

循环体由多个线程/进程同时执行,这就是竞争条件出现的地方.维基百科上关于竞争条件的例子适用于此:

The loop body is executed simultaneously by multiple threads/processes and that's where the race-condition comes in. The example on Wikipedia on a race-condition applies here:

这就是返回的数组包含错误值以及每个元素可能不同的原因.因为它只是不确定哪个线程/进程何时运行.所以在某些情况下,一个元素存在竞争条件,有时没有,有时在多个元素上.

That's why the returned array contains wrong values and why each element might be different. Because it's simply non-deterministic which thread/process runs when. So in some cases there's a race-condition on one element, sometimes on none, sometimes on multiple elements.

然而,numba 开发人员已经在不发生竞争条件的情况下实现了一些受支持的减少.其中之一是 y +=.这里重要的是它是变量本身,而不是变量的切片/元素.在这种情况下 numba 做了一些非常聪明的事情.他们为每个线程/进程复制变量的初始值,然后对该副本进行操作.并行循环完成后,他们将复制的值相加.以您的第二个示例为例,假设它使用了 2 个进程,则大致如下所示:

However the numba developers have implemented some supported reductions where no race-condition occurs. One of them is y +=. The important thing here is that it's the variable itself, instead of a slice/element of the variable. In that case numba does something very clever. They copy the initial value of the variable for each thread/process and then operate on that copy. After the parallel loop finished they add up the copied values. Taking your second example and assuming if it used 2 processes it would look roughly like this:

y = np.zeros(4)
y_1 = y.copy()
y_2 = y.copy()
for i in nb.prange(n):
    if is_process_1:
        y_1[:] += x[i]
    if is_process_2:
        y_2[:] += x[i]
y += y_1
y += y_2

因为每个线程都有自己的数组,所以不可能出现竞争条件.为了使 numba 能够推断出这一点,您必须遵循他们的限制.文档指出 numba 在标量和数组 (y += x[i]) 上为 += 创建无竞争条件的并行代码,但 不在数组元素/切片(y[:] += x[i]y[1] += x[i]).

Since each thread has its own array there's no potential for a race-condition. For numba to be able to deduce this you have to follow their restrictions. The documentation states that numba creates race-condition-free parallel code for += on scalars and arrays (y += x[i]), but not on array elements/slices (y[:] += x[i] or y[1] += x[i]).

这篇关于在 numba 并行化中理解这种竞争条件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆