在cython中有线程局部数组,以便我可以调整它们的大小? [英] Having thread-local arrays in cython so that I can resize them?
问题描述
我有一个间隔树算法,我想对许多使用线程的查询并行运行.问题在于每个线程都需要一个自己的数组,因为我事先无法知道会有多少次命中.
I have an interval-treeish algorithm I would like to run in parallel for many queries using threads. Problem is that then each thread would need its own array, since I cannot know in advance how many hits there will be.
还有其他类似的问题,建议的解决方案始终是使用大小为(K,t)的数组,其中K为输出长度,t为线程数.这对我不起作用,因为每个线程的K可能不同,并且每个线程可能需要调整数组的大小以适合其获得的所有结果.
There are other questions like this, and the solution suggested is always to have an array of size (K, t) where K is output length and t is number of threads. This does not work for me as K might be different for each thread and each thread might need to resize the array to fit all the results it gets.
伪代码:
for i in prange(len(starts)):
qs, qe, qx = starts[i], ends[i], index[i]
results = t.search(qs, qe)
if len(results) + nfound < len(output):
# add result to output
else:
# resize array
# then add results
推荐答案
通常的模式是每个线程都有自己的容器,这是速度/复杂度与内存开销之间的权衡:
An usual pattern is that every thread gets its own container, which is a trade-off between speed/complexity and memory-overhead:
- 不需要锁定就可以访问此容器,因为只有一个线程可以访问它.
- 与每个任务都有自己的容器(即每个
i
-值)"相比,开销要少得多.
- there is no need to lock for access to this container, because only one thread accesses it.
- there is much less overhead compared to "own container for every task (i.e. every
i
-value)".
在并行部分之后,必须在后处理步骤中将数据收集在最终容器中(也可以并行发生),或者后续算法应能够处理容器的收集.
After the parallel section, the data must be either collected in a final container in a post processing step (which also could happen in parallel) or the subsequent algorithms should be able to handle a collection of containers.
以下是使用c ++-vector的示例(已经具有内存管理功能并内置了增加的大小):
Here is an example using c++-vector (which already has memory management and increasing size built-in):
%%cython -+ -c=/openmp --link-args=/openmp
from cython.parallel import prange, threadid
from libcpp.vector cimport vector
cimport openmp
def calc_in_parallel(N):
cdef int i,k,tid
cdef int n = N
cdef vector[vector[int]] vecs
# every thread gets its own container
vecs.resize(openmp.omp_get_max_threads())
for i in prange(n, nogil=True):
tid = threadid()
for k in range(i):
# use container of the thread
vecs[tid].push_back(k) # dummy for calculation
return vecs
在许多情况下,将omp_get_max_threads()
用于线程数会高估实际线程数.在prange
中(即
Using omp_get_max_threads()
for the number of threads will overestimate the real number of threads in many cases. It is probably more robust to set the number of threads explicitly in prange
, i.e.
...
NUM_THREADS = 2
vecs.resize(NUM_THREADS)
for i in prange(n, nogil=True, num_threads = NUM_THREADS):
...
使用纯C可以应用类似的方法,但是在这种情况下将需要更多的样板代码(内存管理).
A similar approach can be applied using pure C, but more boiler plate code (memory management) will be needed in this case.
这篇关于在cython中有线程局部数组,以便我可以调整它们的大小?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!