是否将共享的只读数据复制到不同的进程以进行多处理? [英] Is shared readonly data copied to different processes for multiprocessing?

查看：45 发布时间：2020/5/13 19:22:22 python numpy multiprocessing

本文介绍了是否将共享的只读数据复制到不同的进程以进行多处理?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我拥有的这段代码看起来像这样:

glbl_array = # a 3 Gb array

def my_func( args, def_param = glbl_array):
    #do stuff on args and def_param

if __name__ == '__main__':
  pool = Pool(processes=4)
  pool.map(my_func, range(1000))

是否有一种方法可以确保(或鼓励)不同的进程不会获得glbl_array的副本而是共享它.如果没有办法停止复制，我将使用一个内存映射的数组，但是我的访问模式不是很规则，因此我希望内存映射的数组会更慢.以上似乎是首先要尝试的方法.这是在Linux上.我只是想从Stackoverflow获得一些建议，而又不想惹恼sysadmin.您认为第二个参数是真正的不可变对象(如glbl_array.tostring())会有所帮助吗?

解决方案

您可以很容易地将multiprocessing中的共享内存与Numpy一起使用:

import multiprocessing
import ctypes
import numpy as np

shared_array_base = multiprocessing.Array(ctypes.c_double, 10*10)
shared_array = np.ctypeslib.as_array(shared_array_base.get_obj())
shared_array = shared_array.reshape(10, 10)

#-- edited 2015-05-01: the assert check below checks the wrong thing
#   with recent versions of Numpy/multiprocessing. That no copy is made
#   is indicated by the fact that the program prints the output shown below.
## No copy was made
##assert shared_array.base.base is shared_array_base.get_obj()

# Parallel processing
def my_func(i, def_param=shared_array):
    shared_array[i,:] = i

if __name__ == '__main__':
    pool = multiprocessing.Pool(processes=4)
    pool.map(my_func, range(10))

    print shared_array

打印



However, Linux has copy-on-write semantics on fork()，因此即使不使用multiprocessing.Array，也不会复制数据，除非将其写入.
The piece of code that I have looks some what like this:
glbl_array = # a 3 Gb array

def my_func( args, def_param = glbl_array):
    #do stuff on args and def_param

if __name__ == '__main__':
  pool = Pool(processes=4)
  pool.map(my_func, range(1000))
Is there a way to make sure (or encourage) that the different processes does not get a copy of glbl_array but shares it. If there is no way to stop the copy I will go with a memmapped array, but my access patterns are not very regular, so I expect memmapped arrays to be slower. The above seemed like the first thing to try. This is on Linux. I just wanted some advice from Stackoverflow and do not want to annoy the sysadmin. Do you think it will help if the the second parameter is a genuine immutable object like glbl_array.tostring().
 解决方案 
You can use the shared memory stuff from multiprocessing together with Numpy fairly easily:
import multiprocessing
import ctypes
import numpy as np

shared_array_base = multiprocessing.Array(ctypes.c_double, 10*10)
shared_array = np.ctypeslib.as_array(shared_array_base.get_obj())
shared_array = shared_array.reshape(10, 10)

#-- edited 2015-05-01: the assert check below checks the wrong thing
#   with recent versions of Numpy/multiprocessing. That no copy is made
#   is indicated by the fact that the program prints the output shown below.
## No copy was made
##assert shared_array.base.base is shared_array_base.get_obj()

# Parallel processing
def my_func(i, def_param=shared_array):
    shared_array[i,:] = i

if __name__ == '__main__':
    pool = multiprocessing.Pool(processes=4)
    pool.map(my_func, range(10))

    print shared_array
which prints
[[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]
 [ 2.  2.  2.  2.  2.  2.  2.  2.  2.  2.]
 [ 3.  3.  3.  3.  3.  3.  3.  3.  3.  3.]
 [ 4.  4.  4.  4.  4.  4.  4.  4.  4.  4.]
 [ 5.  5.  5.  5.  5.  5.  5.  5.  5.  5.]
 [ 6.  6.  6.  6.  6.  6.  6.  6.  6.  6.]
 [ 7.  7.  7.  7.  7.  7.  7.  7.  7.  7.]
 [ 8.  8.  8.  8.  8.  8.  8.  8.  8.  8.]
 [ 9.  9.  9.  9.  9.  9.  9.  9.  9.  9.]]


However, Linux has copy-on-write semantics on fork(), so even without using multiprocessing.Array, the data will not be copied unless it is written to.

                        这篇关于是否将共享的只读数据复制到不同的进程以进行多处理?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

是否将共享的只读数据复制到不同的进程以进行多处理? [英] Is shared readonly data copied to different processes for multiprocessing?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

是否将共享的只读数据复制到不同的进程以进行多处理? [英] Is shared readonly data copied to different processes for multiprocessing?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭