运行多个python程序时具有高内核CPU [英] High Kernel CPU when running multiple python programs

查看:100
本文介绍了运行多个python程序时具有高内核CPU的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我开发了执行大量数值计算的python程序.我在具有32个Xeon CPU,64GB RAM和64位Ubuntu 14.04的Linux机器上运行它.我并行启动具有不同模型参数的多个python实例,以使用多个进程,而不必担心全局解释器锁(GIL).当我使用htop监视cpu利用率时,我看到所有内核都被使用了,但是大多数时候内核都使用了.通常,内核时间是用户时间的两倍以上.恐怕在系统级别上会有很多开销,但是我无法找到原因.

I developed a python program that does heavy numerical calculations. I run it on a linux machine with 32 Xeon CPUs, 64GB RAM, and Ubuntu 14.04 64-bit. I launch multiple python instances with different model parameters in parallel to use multiple processes without having to worry about the global interpreter lock (GIL). When I monitor the cpu utilization using htop, I see that all cores are used, however most of the time by kernel. Generally, the kernel time is more than twice the user time. I'm afraid that there is a lot of overhead going on on the system level, but I'm not able to find the cause for this.

如何减少高内核CPU使用率?

以下是我的观察结果:

  • 此效果似乎与我运行10个作业还是运行50个作业无关.如果作业少于内核,则不会使用所有内核,但是内核使用的内核仍然具有较高的CPU使用率
  • 我使用 numba 实现了内部循环,但问题与此无关,因为删除了numba部分无法解决问题
  • 尽管如此,我仍然认为它可能与使用python2有关,类似于此问题中提到的问题,但从python2到python3的变化不大
  • 我测量了操作系统执行的上下文切换的总数,大约每秒10000次.我不确定这是否很大
  • 我尝试通过设置sys.setcheckinterval(10000)(对于python2)和sys.setswitchinterval(10)(对于python3)来增加python时间片,但这无济于事
  • 我尝试通过运行schedtool -B PID影响任务计划程序,但这无济于事
  • This effect appears independent of whether I run 10 jobs or 50. If there are fewer jobs than cores, not all cores are used, but the ones that are used still have a high CPU usage by the kernel
  • I implemented the inner loop using numba, but the problem is not related to this, since removing the numba part does not resolve the problem
  • I also though that it might be related to using python2 similar to the problem mentioned in this SO question but switching from python2 to python3 did not change much
  • I measured the total number of context switches performed by the OS, which is about 10000 per second. I'm not sure whether this is a large number
  • I tried increasing the python time slices by setting sys.setcheckinterval(10000) (for python2) and sys.setswitchinterval(10) (for python3) but none of this helped
  • I tried influencing the task scheduler by running schedtool -B PID but this didn't help

修改: 这是htop的屏幕截图:

Here is a screenshot of htop:

我也运行了perf record -a -g,这是perf report -g graph的报告:

I also ran perf record -a -g and this is the report by perf report -g graph:

Samples: 1M of event 'cycles', Event count (approx.): 1114297095227                                   
-  95.25%          python3  [kernel.kallsyms]                           [k] _raw_spin_lock_irqsave   ◆
   - _raw_spin_lock_irqsave                                                                          ▒
      - 95.01% extract_buf                                                                           ▒
           extract_entropy_user                                                                      ▒
           urandom_read                                                                              ▒
           vfs_read                                                                                  ▒
           sys_read                                                                                  ▒
           system_call_fastpath                                                                      ▒
           __GI___libc_read                                                                          ▒
-   2.06%          python3  [kernel.kallsyms]                           [k] sha_transform            ▒
   - sha_transform                                                                                   ▒
      - 2.06% extract_buf                                                                            ▒
           extract_entropy_user                                                                      ▒
           urandom_read                                                                              ▒
           vfs_read                                                                                  ▒
           sys_read                                                                                  ▒
           system_call_fastpath                                                                      ▒
           __GI___libc_read                                                                          ▒
-   0.74%          python3  [kernel.kallsyms]                           [k] _mix_pool_bytes          ▒
   - _mix_pool_bytes                                                                                 ▒
      - 0.74% __mix_pool_bytes                                                                       ▒
           extract_buf                                                                               ▒
           extract_entropy_user                                                                      ▒
           urandom_read                                                                              ▒
           vfs_read                                                                                  ▒
           sys_read                                                                                  ▒
           system_call_fastpath                                                                      ▒
           __GI___libc_read                                                                          ▒
    0.44%          python3  [kernel.kallsyms]                           [k] extract_buf              ▒
    0.15%          python3  python3.4                                   [.] 0x000000000004b055       ▒
    0.10%          python3  [kernel.kallsyms]                           [k] memset                   ▒
    0.09%          python3  [kernel.kallsyms]                           [k] copy_user_generic_string ▒
    0.07%          python3  multiarray.cpython-34m-x86_64-linux-gnu.so  [.] 0x00000000000b4134       ▒
    0.06%          python3  [kernel.kallsyms]                           [k] _raw_spin_unlock_irqresto▒
    0.06%          python3  python3.4                                   [.] PyEval_EvalFrameEx       

似乎大部分时间都花在了呼叫_raw_spin_lock_irqsave上.我不知道这是什么意思.

It seems as if most of the time is spent calling _raw_spin_lock_irqsave. I have no idea what this means, though.

推荐答案

如果内核中存在问题,则应使用分析器(例如OProfile或

If the problem exists in kernel, you should narrow down a problem using a profiler such as OProfile or perf.

即运行perf record -a -g,然后使用perf report读取保存到perf data的性能分析数据.另请参阅: linux性能:如何解释和查找热点.

I.e. run perf record -a -g and than read profiling data saved into perf data using perf report. See also: linux perf: how to interpret and find hotspots.

在您的情况下,高CPU使用率是由对/dev/urandom的竞争引起的-它仅允许从其中读取一个线程,但是有多个Python进程正在这样做.

In your case high CPU usage is caused by competition for /dev/urandom -- it allows only one thread to read from it, but multiple Python processes are doing so.

Python模块random仅将其用于初始化.即:

Python module random is using it only for initialization. I.e:

$ strace python -c 'import random;
while True:
    random.random()'
open("/dev/urandom", O_RDONLY)     = 4
read(4, "\16\36\366\36}"..., 2500) = 2500
close(4)                                   <--- /dev/urandom is closed

您还可以使用os.urandomSystemRandom类明确要求/dev/urandom.因此,请检查处理随机数的代码.

You may also explicitly ask for /dev/urandom by using os.urandom or SystemRandom class. So check your code which is dealing with random numbers.

这篇关于运行多个python程序时具有高内核CPU的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆