并行写入对象字典 [英] Writing to dictionary of objects in parallel

查看:78
本文介绍了并行写入对象字典的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个对象字典,我想使用多处理程序包填充该字典.这部分代码并行运行"Run"多次.

I have a dictionary of objects, and I would like to populate this dictionary using the multiprocessing package. This bit of code runs "Run" many times in parallel.

Data=dict()
for i in range:
   Data[i]=dataobj(i) #dataobj is a class I have defined elsewhere
   proc=Process(target=Run, args=(i, Data[i]))
   proc.start()

在运行"中进行一些模拟,并将输出保存在dataobj对象中

Where "Run" does some simulations, and saves the output in the dataobj object

def Run(i, out):
    [...some code to run simulations....]
    out.extract(file)

我的代码创建一个对象字典,然后并行修改该字典中的对象.这可能吗,还是每次修改共享字典中的对象时都需要获取锁吗?

My code creates a dictionary of objects, and then modifies the objects in that dictionary in parallel. Is this possible, or do I need to acquire a lock every time I modify an object in the shared dictionary?

推荐答案

基本上,当您使用多处理时,您的进程共享原始对象字典的,因此填充了不同的对象.多处理程序包为您处理的是在进程之间传递python对象消息,以减轻痛苦.

basically, as you're using multiprocessing, then your processes share copies of the original dictionary of objects, and thus populate different ones. What the multiprocessing package handles for you, is messaging of python objects between processes to make things less painful.

针对您的问题的一个好的设计是让主要流程处理填充字典,并让其子流程处理工作.然后使用队列在子进程和主进程之间交换数据.

A good design for your problem is to have the main process handling populating the dictionary, and have its children processes handle the work. Then use a queue to exchange data between the children processes and the master process.

作为一般设计思路,可以执行以下操作:

As a general design idea, here's something that could be done:

from queue import Queue

queues = [Queue(), Queue()]

def simulate(qin, qout):
    while not qin.empty():
        data = qin.pop()
        # work with the data
        qout.put(data)
    # when the queue is empty, the process ends

Process(target=simulate, args=(queues[0][0],queues[0][1])).start()
Process(target=simulate, args=(queues[1][0],queues[1][1])).start()

processed_data_list = []

# first send the data to be processed to the children processes
while data.there_is_more_to_process():
    # here you have to adapt to your context how you want to split the load between your processes
    queues[0].push(data.pop_some_data())
    queues[1].push(data.pop_some_data())

# then for each process' queue 
for qin, qout in queues:
    # you populate your output data list (or dict or whatever)
    while not qout.empty:
        processed_data_list.append(qout.pop())
# here again, you have to adapt to your context how you handle the data sent
# back from the children processes.

尽管,这只是一个设计思路,因为此代码具有一些设计缺陷,在处理实际数据和处理功能时,这些缺陷会自然得到解决.

though, take it as only a design idea, as this code has a few design flaws, which will get naturally solved when working with real data and processing functions.

这篇关于并行写入对象字典的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆