运行多个python程序时具有高内核CPU [英] High Kernel CPU when running multiple python programs
问题描述
我开发了执行大量数值计算的python程序.我在具有32个Xeon CPU,64GB RAM和64位Ubuntu 14.04的Linux机器上运行它.我并行启动具有不同模型参数的多个python实例,以使用多个进程,而不必担心全局解释器锁(GIL).当我使用htop
监视cpu利用率时,我看到所有内核都被使用了,但是大多数时候内核都使用了.通常,内核时间是用户时间的两倍以上.恐怕在系统级别上会有很多开销,但是我无法找到原因.
I developed a python program that does heavy numerical calculations. I run it on a linux machine with 32 Xeon CPUs, 64GB RAM, and Ubuntu 14.04 64-bit. I launch multiple python instances with different model parameters in parallel to use multiple processes without having to worry about the global interpreter lock (GIL). When I monitor the cpu utilization using htop
, I see that all cores are used, however most of the time by kernel. Generally, the kernel time is more than twice the user time. I'm afraid that there is a lot of overhead going on on the system level, but I'm not able to find the cause for this.
如何减少高内核CPU使用率?
以下是我的观察结果:
- 此效果似乎与我运行10个作业还是运行50个作业无关.如果作业少于内核,则不会使用所有内核,但是内核使用的内核仍然具有较高的CPU使用率
- 我使用 numba 实现了内部循环,但问题与此无关,因为删除了numba部分无法解决问题
- 尽管如此,我仍然认为它可能与使用python2有关,类似于此问题中提到的问题,但从python2到python3的变化不大 我测量了操作系统执行的上下文切换的总数,大约每秒10000次.我不确定这是否很大
- 我尝试通过设置
sys.setcheckinterval(10000)
(对于python2)和sys.setswitchinterval(10)
(对于python3)来增加python时间片,但这无济于事 - 我尝试通过运行
schedtool -B PID
影响任务计划程序,但这无济于事
- This effect appears independent of whether I run 10 jobs or 50. If there are fewer jobs than cores, not all cores are used, but the ones that are used still have a high CPU usage by the kernel
- I implemented the inner loop using numba, but the problem is not related to this, since removing the numba part does not resolve the problem
- I also though that it might be related to using python2 similar to the problem mentioned in this SO question but switching from python2 to python3 did not change much
- I measured the total number of context switches performed by the OS, which is about 10000 per second. I'm not sure whether this is a large number
- I tried increasing the python time slices by setting
sys.setcheckinterval(10000)
(for python2) andsys.setswitchinterval(10)
(for python3) but none of this helped - I tried influencing the task scheduler by running
schedtool -B PID
but this didn't help
修改:
这是htop
的屏幕截图:
Here is a screenshot of htop
:
我也运行了perf record -a -g
,这是perf report -g graph
的报告:
I also ran perf record -a -g
and this is the report by perf report -g graph
:
Samples: 1M of event 'cycles', Event count (approx.): 1114297095227
- 95.25% python3 [kernel.kallsyms] [k] _raw_spin_lock_irqsave ◆
- _raw_spin_lock_irqsave ▒
- 95.01% extract_buf ▒
extract_entropy_user ▒
urandom_read ▒
vfs_read ▒
sys_read ▒
system_call_fastpath ▒
__GI___libc_read ▒
- 2.06% python3 [kernel.kallsyms] [k] sha_transform ▒
- sha_transform ▒
- 2.06% extract_buf ▒
extract_entropy_user ▒
urandom_read ▒
vfs_read ▒
sys_read ▒
system_call_fastpath ▒
__GI___libc_read ▒
- 0.74% python3 [kernel.kallsyms] [k] _mix_pool_bytes ▒
- _mix_pool_bytes ▒
- 0.74% __mix_pool_bytes ▒
extract_buf ▒
extract_entropy_user ▒
urandom_read ▒
vfs_read ▒
sys_read ▒
system_call_fastpath ▒
__GI___libc_read ▒
0.44% python3 [kernel.kallsyms] [k] extract_buf ▒
0.15% python3 python3.4 [.] 0x000000000004b055 ▒
0.10% python3 [kernel.kallsyms] [k] memset ▒
0.09% python3 [kernel.kallsyms] [k] copy_user_generic_string ▒
0.07% python3 multiarray.cpython-34m-x86_64-linux-gnu.so [.] 0x00000000000b4134 ▒
0.06% python3 [kernel.kallsyms] [k] _raw_spin_unlock_irqresto▒
0.06% python3 python3.4 [.] PyEval_EvalFrameEx
似乎大部分时间都花在了呼叫_raw_spin_lock_irqsave
上.我不知道这是什么意思.
It seems as if most of the time is spent calling _raw_spin_lock_irqsave
. I have no idea what this means, though.
推荐答案
If the problem exists in kernel, you should narrow down a problem using a profiler such as OProfile or perf.
即运行perf record -a -g
,然后使用perf report
读取保存到perf data
的性能分析数据.另请参阅: linux性能:如何解释和查找热点.
I.e. run perf record -a -g
and than read profiling data saved into perf data
using perf report
. See also: linux perf: how to interpret and find hotspots.
在您的情况下,高CPU使用率是由对/dev/urandom
的竞争引起的-它仅允许从其中读取一个线程,但是有多个Python进程正在这样做.
In your case high CPU usage is caused by competition for /dev/urandom
-- it allows only one thread to read from it, but multiple Python processes are doing so.
Python模块random
仅将其用于初始化.即:
Python module random
is using it only for initialization. I.e:
$ strace python -c 'import random;
while True:
random.random()'
open("/dev/urandom", O_RDONLY) = 4
read(4, "\16\36\366\36}"..., 2500) = 2500
close(4) <--- /dev/urandom is closed
您还可以使用os.urandom
或SystemRandom
类明确要求/dev/urandom
.因此,请检查处理随机数的代码.
You may also explicitly ask for /dev/urandom
by using os.urandom
or SystemRandom
class. So check your code which is dealing with random numbers.
这篇关于运行多个python程序时具有高内核CPU的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!