DFS的Cython并行竞争条件 [英] Cython parallelisation race condition for DFS
问题描述
我正在尝试开发一种AI,以最佳方式玩1人桌游.我正在使用深度优先搜索功能.
I'm attempting to develop an AI to play a 1-player board game optimally. I'm using a depth-first search to a few levels.
我尝试通过对所有循环进行迭代的初始循环进行多线程处理并递归到游戏树中来加快速度.我的想法是,每个线程都会将可能的初始移动板分割成多个块,并在单独的递归函数中进一步进行评估.调用的所有函数都是nogil
I've attempted to speed it up by multithreading the initial loop iterating over all moves and recursing into the game trees. My idea is that each thread will split-up the initial possible move boards into chunks and further evaluate these in a separate recursive function. All functions called are nogil
但是,由于多线程解决方案给出了不同的结果,所以我只能猜测是一种竞争状况,而且我不确定该如何解决.
However, I'm encountering what I can only guess is a race condition because the multi-threaded solution gives different results, and I'm not sure how to go about fixing it.
cdef struct Move:
int x
int y
int score
cdef Move search( board_t& board, int prevClears, int maxDepth, int depth ) nogil:
cdef Move bestMove
cdef Move recursiveMove
cdef vector[ Move ] moves = generateMoves( board )
cdef board_t nextBoard
cdef int i, clears
bestMove.score = 0
# Split the initial possible move boards amongst threads
for i in prange( <int> moves.size(), nogil = True ):
# Applies move and calculates the move score
nextBoard = applyMove( board, moves[ i ], prevClears, maxDepth, depth )
# Recursively evaluate further moves
if maxDepth - depth > 0:
clears = countClears( nextBoard )
recursiveMove = recursiveSearch( nextBoard, moves[ i ], clears, maxDepth, depth + 1 )
moves[ i ].score += recursiveMove.score
# Update bestMove
if moves[ i ].score > bestMove.score:
bestMove = moves[ i ]
return bestMove
推荐答案
当涉及到prange
时,Cython会做一些魔术,这取决于微妙的事情-因此,人们真的必须查看生成的C代码才能理解什么继续.
Cython does some magic, which depends on subtle things, when prange
is involved - so one really has to look at the resulting C code to understand what is going on.
据我所见,您的代码至少有两个问题.
As far as I can see your code, there are at least 2 problems.
1.问题: bestMove
未初始化.
%%cython -+
cdef struct Move:
...
def foo()
cdef Move bestMove
return bestMove
将导致以下C代码:
...
struct __pyx_t_XXX_Move __pyx_v_bestMove;
...
__pyx_r = __pyx_convert__to_py_struct____pyx_t_XXX_Move(__pyx_v_bestMove); if ...
return __pyx_r;
局部变量__pyx_v_bestMove
将保持未初始化状态(例如,参见 SO-post ),即使它很好可能的是,初始值将仅包含零.
The local variable __pyx_v_bestMove
will stay uninitialized (see e.g. this SO-post), even if it is well possible, that the initial value will consist only out of zeros.
例如,如果是bestMove
int,Cython会发出警告,但不适用于结构.
Were bestMove
for example an int, Cython would give a warning, but it doesn't for structs.
2.问题:分配bestMove
会导致赛车状态.
2. Problem: assigning bestMove
leads to racing condition.
顺便说一句,结果可能不仅是最佳举动,而且甚至是非法举动,因为它可能是以下各项的组合(来自不同法律举动的x
-,y
-,score
-值)其他指定的法律举动.
Btw, the result might not only be not the best move, but even an illegal move alltogether as it could be a combination (x
-,y
-,score
- values from different legal moves) of other assigned legal moves.
这里是此问题的较小复制者:
Here is a smaller reproducer of the issue:
%%cython -c=-fopenmp --link-args=-fopenmp
# cython
cimport cython
from cython.parallel import prange
cdef struct A:
double a
@cython.boundscheck(False)
def search_max(double[::1] vals):
cdef A max_val = [-1.0] # initialized!
cdef int i
cdef int n = len(vals)
for i in prange(n, nogil=True):
if(vals[i]>max_val.a):
max_val.a = vals[i]
return max_val.a
是max_val
,是cdef double
Cython不会构建它,因为它会尝试将max_val
设为私有(巧妙).但是现在,max_val
在线程之间共享(请参见生成的C代码),并且应该保护对它的访问.如果没有,我们可以看到结果(一个人可能需要运行多次才能触发竞争条件)
Were max_val
a cdef double
Cython wouldn't build it as it would try to make max_val
private (subtly magic). But now, max_val
is shared between threads (see resulting C-code) and the access to it should be guarded. If not we can see (one might need to run multiple times to trigger the race condition) the result:
>>> import numpy as np
>>> a = np.random.rand(1000)
>>> search_max(a)-search_max(a)
#0.0006253360398751351 but should be 0.0
该怎么办?按照@DavidW的建议,我们可以收集每个线程的最大值,然后在后期处理步骤中找到绝对最大值-请参见此 SO帖子,导致:
What can be done? As @DavidW has proposed, we could collect maximum per thread and then find absolute maximum in a post process step - see this SO-post, which leads to:
%%cython -+ -c=-fopenmp --link-args=-fopenmp
cimport cython
from cython.parallel import prange, threadid
from libcpp.vector cimport vector
cimport openmp
cdef struct A:
double a
@cython.boundscheck(False)
def search_max(double[::1] vals):
cdef int i, tid
cdef int n = len(vals)
cdef vector[A] max_vals
# every thread gets its own max value:
NUM_THREADS = 4
max_vals.resize(NUM_THREADS, [-1.0])
for i in prange(n, nogil=True, num_threads = NUM_THREADS):
tid = threadid()
if(vals[i]>max_vals[tid].a):
max_vals[tid].a = vals[i]
#post process, collect results of threads:
cdef double res = -1.0
for i in range(NUM_THREADS):
if max_vals[i].a>res:
res = max_vals[i].a
return res
我认为将openmp功能与C/C ++一起使用并用Cython包装生成的代码更容易且更容易出错:Cython不仅不
I think it is easier and less error prone to use openmp functionality with C/C++ and wrap the resulting code with Cython: Not only doesn't Cython support everything what openmp offers, but seeing problems in parallel code is hard enough when looking at simple C-code, without any implicit magic done by Cython.
这篇关于DFS的Cython并行竞争条件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!