为什么get()在多处理中速度慢? [英] Why is get() slow in multiprocessing?

查看:169
本文介绍了为什么get()在多处理中速度慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个基本的多处理类,该类接受一些参数并将其发送给工作人员:

I have a basic multiprocessing class which takes some parameters and sends them off to a worker:

class Multi(object):
    def __init__(self, pool_parameters, pool_size):
        self.pool_parameters = pool_parameters  # Parameters in a tuple
        self.pool_size = pool_size
        self.pool = mp.Pool(self.pool_size)
        self.results = \
            [self.pool.apply_async(worker, args=((self.pool_parameters[i]),),)
                for i in range(self.pool_size)]
        time1 = time.time()
        self.output = [r.get() for r in self.results]  # Output objects in here
        print time.time() - time1

def worker(*args):
    # Do stuff
    return stuff

但是r.get()行似乎需要很长时间.如果我的pool_size为1,则工作程序将在0.1秒内返回其结果,但是r.get()行又需要1.35秒.为什么要花这么长时间,特别是如果只启动一个过程?

However the r.get() line seems to take ages. If I have a pool_size of 1, the worker returns its result in 0.1 seconds, but the r.get() line takes another 1.35 seconds. Why does it take so long, especially if only one process is started?

对于单个进程并使用worker返回单个None值,self.output行在我的系统上仍需要1.3秒(使用time.time()对该行进行计时)

For a single process and using the worker to return a single None value, the self.output line still takes 1.3 seconds on my system (using time.time() to time that line)

对不起,我发现了问题,但我认为这与多处理无关.问题似乎来自导入其他各种模块.当我摆脱进口时,时间为0.1秒.不知道为什么...

Sorry, I found the problem and I dont think it is to do with multiprocessing. The problem seems to come from importing various other modules. When I got rid my imports the time was 0.1 seconds. No idea why though...

推荐答案

您看到的性能很差,因为您在进程之间发送了一个大对象.对子对象中的对象进行腌制,在进程之间发送这些字节,然后在父对象中对它们进行解腌,则花费了很短的时间.这是multiprocessing最佳做法建议避免大量使用的原因之一共享状态:

You're seeing poor performance because you're sending a large object between the processes. Pickling the object in the child, sending those bytes between processes, and then unpickling them in parent, takes a non-trivial amount of time. This is one of the reasons the best practices for multiprocessing suggests avoiding large amounts of shared state:

避免共享状态

应尽可能避免转移大量的 进程之间的数据.

As far as possible one should try to avoid shifting large amounts of data between processes.

如果在对象上调用pickle.loads(pickle.dumps(obj)),则可能可以隔离此行为.我希望它花的时间几乎与get()调用一样长.

You'll probably be able to isolate this behavior if you call pickle.loads(pickle.dumps(obj)) on your object. I would expect it to take almost as long as the get() call.

这篇关于为什么get()在多处理中速度慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆