在python中使用多进程和numpy的速度急剧下降 [英] dramatic slow down using multiprocess and numpy in python
问题描述
我为Q学习算法编写了一个python代码,由于该算法具有随机输出,因此我必须多次运行它.因此,我使用multiprocessing
模块.代码的结构如下
I write a python code for Q-learning algorithm and I have to run it multiple times since this algorithm has random output. Thus I use multiprocessing
module. The structure of the code is as follows
import numpy as np
import scipy as sp
import multiprocessing as mp
# ...import other modules...
# ...define some parameters here...
# using multiprocessing
result = []
num_threads = 3
pool = mp.Pool(num_threads)
for cnt in range(num_threads):
args = (RL_params+phys_params) # arguments
result.append(pool.apply_async(Q_learning, args))
pool.close()
pool.join()
我的代码中没有I/O操作,并且我的工作站具有6个核心(12个线程)和足够的内存来完成此工作.当我用num_threads=1
运行代码时,只用了13秒,此任务仅占用1个线程,CPU使用率100%(使用top
命令).
There is no I/O operation in my code and my work station has 6 cores (12 threads) and enough memory for this job. When I run the code with num_threads=1
, it takes me only 13 seconds and this mission only occupies 1 thread with CPU usage 100% (using top
command).
但是,如果我使用num_threads=3
(或更高版本)运行它,则将花费40秒钟以上的时间,并且此任务将占用3个线程,每个线程使用100%CPU内核.
However, if I run it with num_threads=3
(or more), it shall takes more than 40 seconds and this mission will occupy 3 threads with each thread use 100% CPU core.
我无法理解这种速度下降的原因,因为所有自定义函数都没有并行化,也没有I/O操作.有趣的是,当num_threads=1
时,CPU使用率始终小于100%,但是当num_threads
大于1时,CPU使用率有时可能为101%或102%.
I can't understand this slowing down because there is no parallelization in all self-defined functions and no I/O operation. It is also interesting to notice that when num_threads=1
, CPU usage is always less than 100%, but when num_threads
is larger than 1, CPU usage may sometimes be 101% or 102%.
另一方面,我编写了另一个简单的测试文件,该文件不导入numpy和scipy,因此此问题从不显示.我注意到了这个问题为什么numpy.mean不是多线程的?问题是由于numpy
(例如dot
)中某些方法的自动并行化所致.但是,正如我在图片中所显示的,当我运行一个作业时,我看不到任何并行化.
On the other hand, I wrote another simple test file which does not import numpy and scipy, then this problem never show. I have noticed this question why isn't numpy.mean multithreaded? and it seem my problem is due to the automatic parallelization of some methods in numpy
(such dot
). But as I shown in the pictures, I can't see any parallelization when I run a single job.
推荐答案
使用多处理池时,所有参数和结果都通过pickle
发送.这可能会占用大量处理器,并且非常耗时.这可能是问题的根源,尤其是当您的论点和/或结果很大时.在这些情况下,Python可能花费比执行计算更多的时间来进行数据的酸洗和酸洗.
When you use a multiprocessing pool, all the arguments and results get sent through pickle
. This can be very processor-intensive and time-consuming. That could be the source of your problem, especially if your arguments and/or results are large. In those cases, Python may spend more time pickling and unpickling the data than it does running computations.
但是,numpy
在计算过程中会释放全局解释器锁,因此,如果您的工作是numpy密集型的,则可以通过使用线程而不是多处理来加快速度.这样可以避免酸洗步骤.请参阅此处以了解更多详细信息: https://stackoverflow.com/a/38775513/3830997
However, numpy
releases the global interpreter lock during computations, so if your work is numpy-intensive, you may be able to speed it up by using threading instead of multiprocessing. That would avoid the pickling step. See here for more details: https://stackoverflow.com/a/38775513/3830997
这篇关于在python中使用多进程和numpy的速度急剧下降的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!