Python,使用多重处理进一步加快cython函数的速度 [英] Python, use multiprocessing to further speed up a cython function
问题描述
此处显示的代码被简化,但触发了相同的PicklingError.我知道关于可以腌制什么和不能腌制什么有很多讨论,但是我确实从中找到了解决方案.
the code shown here are simplied but triggers the same PicklingError. I know there is a lot discussion on what can and cannot be pickled, but I did find the solution from them.
我编写了一个具有以下功能的简单cython脚本:
I write a simple cython script with the following function:
def pow2(int a) :
return a**2
编译正常,我可以在python脚本中调用此函数.
The compilation is working, I can call this function in python script.
但是,我想知道如何将此功能与多处理一起使用
However, I am wondering how to use this function with multiprocessing,
from multiprocessing import Pool
from fast import pow2
p = Pool(processes =4 )
y = p.map( pow2, np.arange( 10, dtype=int))
给我一个PicklingError:
gives me an PicklingError:
dtw是软件包的名称,fast是fast.pyx.
dtw is the name of the package, and fast is fast.pyx.
如何解决这个问题? 预先感谢
How can I get around this problem? Thanks in advance
推荐答案
代替使用multiprocessing
,这意味着由于酸洗过程会在磁盘上写入数据,您可以使用OpenMP包装器prange
.在您的情况下,您可以按如下所示使用它.
Instead of using multiprocessing
, which implies writting data on disk due to the pickling process you can use the OpenMP wrapper prange
. In your case you could use it like shown below.
- 请注意使用
x*x
而不是x**2
,避免了函数调用pow(x, 2)
): - 使用
double
指针将数组的一部分传递给每个线程 - 当
size % num_threads != 0
时,最后一个线程获取更多值
- note the use of
x*x
instead ofx**2
, avoiding the function callpow(x, 2)
): - a part of the array is passed to each thread, using
double
pointers - the last thread takes more values when
size % num_threads != 0
代码:
#cython: wraparound=False
#cython: boundscheck=False
#cython: cdivision=True
#cython: nonecheck=False
#cython: profile=False
import numpy as np
cimport numpy as np
from cython.parallel import prange
cdef void cpow2(int size, double *inp, double *out) nogil:
cdef int i
for i in range(size):
out[i] = inp[i]*inp[i]
def pow2(np.ndarray[np.float64_t, ndim=1] inp,
np.ndarray[np.float64_t, ndim=1] out,
int num_threads=4):
cdef int thread
cdef np.ndarray[np.int32_t, ndim=1] sub_sizes, pos
size = np.shape(inp)[0]
sub_sizes = np.zeros(num_threads, np.int32) + size//num_threads
pos = np.zeros(num_threads, np.int32)
sub_sizes[num_threads-1] += size % num_threads
pos[1:] = np.cumsum(sub_sizes)[:num_threads-1]
for thread in prange(num_threads, nogil=True, chunksize=1,
num_threads=num_threads, schedule='static'):
cpow2(sub_sizes[thread], &inp[pos[thread]], &out[pos[thread]])
def main():
a = np.arange(642312323).astype(np.float64)
pow2(a, out=a, num_threads=4)
这篇关于Python,使用多重处理进一步加快cython函数的速度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!