在cython中有线程局部数组,以便我可以调整它们的大小? [英] Having thread-local arrays in cython so that I can resize them?

查看:109
本文介绍了在cython中有线程局部数组,以便我可以调整它们的大小?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个间隔树算法,我想对许多使用线程的查询并行运行.问题在于每个线程都需要一个自己的数组,因为我事先无法知道会有多少次命中.

I have an interval-treeish algorithm I would like to run in parallel for many queries using threads. Problem is that then each thread would need its own array, since I cannot know in advance how many hits there will be.

还有其他类似的问题,建议的解决方案始终是使用大小为(K,t)的数组,其中K为输出长度,t为线程数.这对我不起作用,因为每个线程的K可能不同,并且每个线程可能需要调整数组的大小以适合其获得的所有结果.

There are other questions like this, and the solution suggested is always to have an array of size (K, t) where K is output length and t is number of threads. This does not work for me as K might be different for each thread and each thread might need to resize the array to fit all the results it gets.

伪代码:

for i in prange(len(starts)):

    qs, qe, qx = starts[i], ends[i], index[i]

    results = t.search(qs, qe)

    if len(results) + nfound < len(output):
        # add result to output
    else:
        # resize array
        # then add results

推荐答案

通常的模式是每个线程都有自己的容器,这是速度/复杂度与内存开销之间的权衡:

An usual pattern is that every thread gets its own container, which is a trade-off between speed/complexity and memory-overhead:

  1. 不需要锁定就可以访问此容器,因为只有一个线程可以访问它.
  2. 与每个任务都有自己的容器(即每个i-值)"相比,开销要少得多.
  1. there is no need to lock for access to this container, because only one thread accesses it.
  2. there is much less overhead compared to "own container for every task (i.e. every i-value)".

在并行部分之后,必须在后处理步骤中将数据收集在最终容器中(也可以并行发生),或者后续算法应能够处理容器的收集.

After the parallel section, the data must be either collected in a final container in a post processing step (which also could happen in parallel) or the subsequent algorithms should be able to handle a collection of containers.

以下是使用c ++-vector的示例(已经具有内存管理功能并内置了增加的大小):

Here is an example using c++-vector (which already has memory management and increasing size built-in):

%%cython -+ -c=/openmp --link-args=/openmp

from cython.parallel import prange, threadid
from libcpp.vector cimport vector
cimport openmp

def calc_in_parallel(N):    
    cdef int i,k,tid
    cdef int n = N
    cdef vector[vector[int]] vecs
    # every thread gets its own container
    vecs.resize(openmp.omp_get_max_threads())
    for i in prange(n, nogil=True):  
        tid = threadid()
        for k in range(i):
            # use container of the thread
            vecs[tid].push_back(k) # dummy for calculation

    return vecs

在许多情况下,将omp_get_max_threads()用于线程数会高估实际线程数.在prange中(即

Using omp_get_max_threads() for the number of threads will overestimate the real number of threads in many cases. It is probably more robust to set the number of threads explicitly in prange, i.e.

...
NUM_THREADS = 2
vecs.resize(NUM_THREADS)
for i in prange(n, nogil=True, num_threads = NUM_THREADS): 
...

使用纯C可以应用类似的方法,但是在这种情况下将需要更多的样板代码(内存管理).

A similar approach can be applied using pure C, but more boiler plate code (memory management) will be needed in this case.

这篇关于在cython中有线程局部数组,以便我可以调整它们的大小?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆