如何在 Python 中共享对象数组 [英] How to share array of objects in Python
问题描述
我有一个函数,可以在其中创建一个 pool
进程.此外,我使用 multiprocessing.Value()
和 multiprocessing.Lock()
来管理进程之间的一些共享值.
I have a function in which I create a pool
of processes. More over I use multiprocessing.Value()
and multiprocessing.Lock()
in order to manage some shared values between processes.
我想对一组对象做同样的事情,以便在进程之间共享它,但我不知道该怎么做.我只会从那个数组中读取.
I want to do the same thing with an array of objects in order to share it between processes but I don't know how to do it. I will only read from that array.
这是函数:
from multiprocessing import Value,Pool,Lock,cpu_count
def predict(matches_path, unknown_path, files_path, imtodetect_path, num_query_photos, use_top3, uid, workbook, excel_file_path,modelspath,email_address):
shared_correct_matched_imgs = Value('i', 0)
shared_unknown_matched_imgs = Value('i', 0)
shared_tot_imgs = Value('i', 0)
counter = Value('i', 0)
shared_lock = Lock()
num_workers = cpu_count()
feature = load_feature(modelspath)
pool = Pool(initializer=init_globals,
initargs=[counter, shared_tot_imgs, shared_correct_matched_imgs, shared_unknown_matched_imgs,
shared_lock], processes=num_workers)
for img in glob.glob(os.path.join(imtodetect_path, '*g')):
pool.apply_async(predict_single_img, (img,imtodetect_path,excel_file_path,files_path,use_top3,uid,matches_path,unknown_path,num_query_photos,index,modelspath))
index+=increment
pool.close()
pool.join()
数组是用指令feature = load_feature(modelspath)
创建的.这是我要分享的数组.
The array is created with the instruction feature = load_feature(modelspath)
. This is the array that I want to share.
在 init_globals
中,我初始化共享值:
In init_globals
I inizialize the shared value:
def init_globals(counter, shared_tot_imgs, shared_correct_matched_imgs, shared_unknown_matched_imgs, shared_lock):
global cnt, tot_imgs, correct_matched_imgs, unknown_matched_imgs, lock
cnt = counter
tot_imgs = shared_tot_imgs
correct_matched_imgs = shared_correct_matched_imgs
unknown_matched_imgs = shared_unknown_matched_imgs
lock = shared_lock
推荐答案
提供共享静态数据的简单方法就是将其设置为可供您要调用的函数访问的全局变量.如果您使用的操作系统支持fork",那么在子进程中使用全局变量是非常简单的,只要它们是常量(如果您修改它们,更改将不会反映在其他进程中)
The easy way of providing shared static data is simply to make it a global variable accessible to the function you want to call. If you're using an operating system which supports "fork", it is very straightforward to use global variables in child processes as long as they're constant (if you modify them, changes won't be reflected in the other processes)
import multiprocessing as mp
from random import randint
shared = ['some', 'shared', 'data', f'{randint(0,1e6)}']
def foo():
print(' '.join(shared))
if __name__ == "__main__":
mp.set_start_method("fork")
#defining "shared" here would be valid also
p = mp.Process(target=foo)
p.start()
p.join()
print(' '.join(shared)) #same random number means "shared" is same object
这在使用spawn"时不起作用;作为 start 方法(Windows 上唯一可用的方法),因为父级的内存不会以任何方式与子级共享,因此子级必须导入"主文件来访问目标函数是什么(这也是你可能会遇到装饰器问题的原因.)如果你在 if __name__ == "__main__":
块之外定义你的数据,它会有点工作,但是您将制作单独的数据副本,如果数据很大,创建速度很慢,或者每次创建都可以更改,这可能是不可取的.
This won't work when using "spawn" as the start method (the only one available on windows), because the memory of the parent is not shared in any way with the child, so the child must "import" the main file to gain access to whatever the target function is (this is also why you can run into problems with decorators.) If you define your data outside the if __name__ == "__main__":
block, it will kinda work, but you will have made separate copies of the data, which can be undesirable if it's big, slow to create, or can change each time it's created.
import multiprocessing as mp
from random import randint
shared = ['some', 'shared', 'data', f'{randint(0,1e6)}']
def foo():
print(' '.join(shared))
if __name__ == "__main__":
mp.set_start_method("spawn")
p = mp.Process(target=foo)
p.start()
p.join()
print(' '.join(shared)) #different number means different copy of "shared" (1 a million chance of being same i guess...)
这篇关于如何在 Python 中共享对象数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!