列表的Python多进程字典 [英] Python multiprocess dict of list

查看:56
本文介绍了列表的Python多进程字典的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要使用 Python 3.6 在多进程中做一些事情.也就是说,我必须更新一个添加对象列表的字典.由于这些对象是不可选择的,我需要使用 dill 而不是 picklemultiprocess 来自 pathos 而不是 多处理,但这应该不是问题.

I need to do some stuffs in multiprocess with Python 3.6. Namely, I have to update a dict adding lists of objects. Since these objects are unpickable I need to use dill instead of pickle and multiprocess from pathos instead of multiprocessing, but this should not be the problem.

将列表添加到字典需要在添加到字典之前重新序列化列表.这会减慢一切,并且花费的时间与没有多处理的时间相同.你能给我建议一个解决方法吗?

Adding a list to the dictionary needs to reserialize the list before of adding to the dictionary. This slow down everything and it takes the same time as without multiprocessing. Could you suggest me a workaround?

这是我使用 python 3.6 编写的代码:init1 正在运行但很慢,init2 很快但坏了.其余仅用于测试目的.

This is my code with python 3.6: init1 is working but slow, init2 is fast but broken. The remaining is only for test purpose.

import time

def init1(d: dict):
    for i in range(1000):
        l = []
        for k in range(i):
             l.append(k)
        d[i] = l

def init2(d: dict):
    for i in range(1000):
        l = []
        d[i] = l
        for k in range(i):
            l.append(i)

def test1():
    import multiprocess as mp
    with mp.Manager() as manager:
        d = manager.dict()
        p = mp.Process(target=init1, args=(d,))
        p.start()
        p.join()
        print(d)

def test2():
    import multiprocess as mp
    with mp.Manager() as manager:
        d = manager.dict()
        p = mp.Process(target=init2, args=(d,))
        p.start()
        p.join()
        print(d)

start = time.time()
test1()
end = time.time()
print('test1: ', end - start)


start = time.time()
test2()
end = time.time()
print('test2: ', end - start)

推荐答案

使用管道的可能解决方案.在我的电脑上,这需要 870 毫秒,而 test1 为 1.10 毫秒,test2 为 200 毫秒.

Possible solution using pipes. On my pc this takes 870ms, compared to 1.10s of test1 and 200ms of test2.

def init3(child_conn):
    d = {}
    for i in range(1000):
        l = []
        for k in range(i):
            l.append(i)
        d[i] = l
    child_conn.send(d)

def test3():
    import multiprocess as mp
    parent_conn, child_conn = mp.Pipe(duplex=False)
    p = mp.Process(target=init3, args=(child_conn,))
    p.start()
    d = parent_conn.recv()
    p.join()

在 jupyter 上,通过使用魔法 %timeit 我得到:

On jupyter, by using magic %timeit I get:

In [01]: %timeit test3()
872 ms ± 11.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [02]: %timeit test2()
199 ms ± 1.72 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [03]: %timeit test1()
1.09 s ± 10.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

这篇关于列表的Python多进程字典的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆