在python中使用多进程和numpy的速度急剧下降 [英] dramatic slow down using multiprocess and numpy in python

查看:207
本文介绍了在python中使用多进程和numpy的速度急剧下降的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我为Q学习算法编写了一个python代码,由于该算法具有随机输出,因此我必须多次运行它.因此,我使用multiprocessing模块.代码的结构如下

I write a python code for Q-learning algorithm and I have to run it multiple times since this algorithm has random output. Thus I use multiprocessing module. The structure of the code is as follows

import numpy as np
import scipy as sp
import multiprocessing as mp
# ...import other modules...

# ...define some parameters here...

# using multiprocessing
result = []
num_threads = 3
pool = mp.Pool(num_threads)
for cnt in range(num_threads):
    args = (RL_params+phys_params) # arguments
    result.append(pool.apply_async(Q_learning, args))

pool.close()
pool.join()

我的代码中没有I/O操作,并且我的工作站具有6个核心(12个线程)和足够的内存来完成此工作.当我用num_threads=1运行代码时,只用了13秒,此任务仅占用1个线程,CPU使用率100%(使用top命令).

There is no I/O operation in my code and my work station has 6 cores (12 threads) and enough memory for this job. When I run the code with num_threads=1, it takes me only 13 seconds and this mission only occupies 1 thread with CPU usage 100% (using top command).

单击以查看CPU状态图片

但是,如果我使用num_threads=3(或更高版本)运行它,则将花费40秒钟以上的时间,并且此任务将占用3个线程,每个线程使用100%CPU内核.

However, if I run it with num_threads=3 (or more), it shall takes more than 40 seconds and this mission will occupy 3 threads with each thread use 100% CPU core.

单击以查看CPU状态图片

我无法理解这种速度下降的原因,因为所有自定义函数都没有并行化,也没有I/O操作.有趣的是,当num_threads=1时,CPU使用率始终小于100%,但是当num_threads大于1时,CPU使用率有时可能为101%或102%.

I can't understand this slowing down because there is no parallelization in all self-defined functions and no I/O operation. It is also interesting to notice that when num_threads=1, CPU usage is always less than 100%, but when num_threads is larger than 1, CPU usage may sometimes be 101% or 102%.

另一方面,我编写了另一个简单的测试文件,该文件不导入numpy和scipy,因此此问题从不显示.我注意到了这个问题为什么numpy.mean不是多线程的?问题是由于numpy(例如dot)中某些方法的自动并行化所致.但是,正如我在图片中所显示的,当我运行一个作业时,我看不到任何并行化.

On the other hand, I wrote another simple test file which does not import numpy and scipy, then this problem never show. I have noticed this question why isn't numpy.mean multithreaded? and it seem my problem is due to the automatic parallelization of some methods in numpy (such dot). But as I shown in the pictures, I can't see any parallelization when I run a single job.

推荐答案

使用多处理池时,所有参数和结果都通过pickle发送.这可能会占用大量处理器,并且非常耗时.这可能是问题的根源,尤其是当您的论点和/或结果很大时.在这些情况下,Python可能花费比执行计算更多的时间来进行数据的酸洗和酸洗.

When you use a multiprocessing pool, all the arguments and results get sent through pickle. This can be very processor-intensive and time-consuming. That could be the source of your problem, especially if your arguments and/or results are large. In those cases, Python may spend more time pickling and unpickling the data than it does running computations.

但是,numpy在计算过程中会释放全局解释器锁,因此,如果您的工作是numpy密集型的,则可以通过使用线程而不是多处理来加快速度.这样可以避免酸洗步骤.请参阅此处以了解更多详细信息: https://stackoverflow.com/a/38775513/3830997

However, numpy releases the global interpreter lock during computations, so if your work is numpy-intensive, you may be able to speed it up by using threading instead of multiprocessing. That would avoid the pickling step. See here for more details: https://stackoverflow.com/a/38775513/3830997

这篇关于在python中使用多进程和numpy的速度急剧下降的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆