Python,使用多重处理进一步加快cython函数的速度 [英] Python, use multiprocessing to further speed up a cython function

查看:116
本文介绍了Python,使用多重处理进一步加快cython函数的速度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此处显示的代码被简化,但触发了相同的PicklingError.我知道关于可以腌制什么和不能腌制什么有很多讨论,但是我确实从中找到了解决方案.

the code shown here are simplied but triggers the same PicklingError. I know there is a lot discussion on what can and cannot be pickled, but I did find the solution from them.

我编写了一个具有以下功能的简单cython脚本:

I write a simple cython script with the following function:

def pow2(int a) : 
    return a**2 

编译正常,我可以在python脚本中调用此函数.

The compilation is working, I can call this function in python script.

但是,我想知道如何将此功能与多处理一起使用

However, I am wondering how to use this function with multiprocessing,

from multiprocessing import Pool
from fast import pow2
p = Pool(processes =4 )
y = p.map( pow2, np.arange( 10, dtype=int))

给我一​​个PicklingError:

gives me an PicklingError:

dtw是软件包的名称,fast是fast.pyx.

dtw is the name of the package, and fast is fast.pyx.

如何解决这个问题? 预先感谢

How can I get around this problem? Thanks in advance

推荐答案

代替使用multiprocessing,这意味着由于酸洗过程会在磁盘上写入数据,您可以使用OpenMP包装器prange.在您的情况下,您可以按如下所示使用它.

Instead of using multiprocessing, which implies writting data on disk due to the pickling process you can use the OpenMP wrapper prange. In your case you could use it like shown below.

  • 请注意使用x*x而不是x**2,避免了函数调用pow(x, 2)):
  • 使用double指针将数组的一部分传递给每个线程
  • size % num_threads != 0
  • 时,最后一个线程获取更多值
  • note the use of x*x instead of x**2, avoiding the function call pow(x, 2)):
  • a part of the array is passed to each thread, using double pointers
  • the last thread takes more values when size % num_threads != 0

代码:

#cython: wraparound=False
#cython: boundscheck=False
#cython: cdivision=True
#cython: nonecheck=False
#cython: profile=False
import numpy as np
cimport numpy as np
from cython.parallel import prange

cdef void cpow2(int size, double *inp, double *out) nogil:
    cdef int i
    for i in range(size):
        out[i] = inp[i]*inp[i]

def pow2(np.ndarray[np.float64_t, ndim=1] inp,
         np.ndarray[np.float64_t, ndim=1] out,
         int num_threads=4):
    cdef int thread
    cdef np.ndarray[np.int32_t, ndim=1] sub_sizes, pos
    size = np.shape(inp)[0]
    sub_sizes = np.zeros(num_threads, np.int32) + size//num_threads
    pos = np.zeros(num_threads, np.int32)
    sub_sizes[num_threads-1] += size % num_threads
    pos[1:] = np.cumsum(sub_sizes)[:num_threads-1]
    for thread in prange(num_threads, nogil=True, chunksize=1,
                         num_threads=num_threads, schedule='static'):
        cpow2(sub_sizes[thread], &inp[pos[thread]], &out[pos[thread]])

def main():
    a = np.arange(642312323).astype(np.float64)
    pow2(a, out=a, num_threads=4)

这篇关于Python,使用多重处理进一步加快cython函数的速度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆