运行多个python程序时具有高内核CPU [英] High Kernel CPU when running multiple python programs

查看：100 发布时间：2020/5/1 9:09:37 python linux performance multiprocessing

本文介绍了运行多个python程序时具有高内核CPU的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我开发了执行大量数值计算的python程序.我在具有32个Xeon CPU，64GB RAM和64位Ubuntu 14.04的Linux机器上运行它.我并行启动具有不同模型参数的多个python实例，以使用多个进程，而不必担心全局解释器锁(GIL).当我使用htop监视cpu利用率时，我看到所有内核都被使用了，但是大多数时候内核都使用了.通常，内核时间是用户时间的两倍以上.恐怕在系统级别上会有很多开销，但是我无法找到原因.

I developed a python program that does heavy numerical calculations. I run it on a linux machine with 32 Xeon CPUs, 64GB RAM, and Ubuntu 14.04 64-bit. I launch multiple python instances with different model parameters in parallel to use multiple processes without having to worry about the global interpreter lock (GIL). When I monitor the cpu utilization using htop, I see that all cores are used, however most of the time by kernel. Generally, the kernel time is more than twice the user time. I'm afraid that there is a lot of overhead going on on the system level, but I'm not able to find the cause for this.

如何减少高内核CPU使用率?

以下是我的观察结果:

此效果似乎与我运行10个作业还是运行50个作业无关.如果作业少于内核，则不会使用所有内核，但是内核使用的内核仍然具有较高的CPU使用率
我使用 numba 实现了内部循环，但问题与此无关，因为删除了numba部分无法解决问题
尽管如此，我仍然认为它可能与使用python2有关，类似于此问题中提到的问题，但从python2到python3的变化不大
我尝试通过设置sys.setcheckinterval(10000)(对于python2)和sys.setswitchinterval(10)(对于python3)来增加python时间片，但这无济于事
我尝试通过运行schedtool -B PID影响任务计划程序，但这无济于事

This effect appears independent of whether I run 10 jobs or 50. If there are fewer jobs than cores, not all cores are used, but the ones that are used still have a high CPU usage by the kernel
I implemented the inner loop using numba, but the problem is not related to this, since removing the numba part does not resolve the problem
I also though that it might be related to using python2 similar to the problem mentioned in this SO question but switching from python2 to python3 did not change much
I measured the total number of context switches performed by the OS, which is about 10000 per second. I'm not sure whether this is a large number
I tried increasing the python time slices by setting sys.setcheckinterval(10000) (for python2) and sys.setswitchinterval(10) (for python3) but none of this helped
I tried influencing the task scheduler by running schedtool -B PID but this didn't help

修改: 这是htop的屏幕截图:

Here is a screenshot of htop:

我也运行了perf record -a -g，这是perf report -g graph的报告:

I also ran perf record -a -g and this is the report by perf report -g graph:

Samples: 1M of event 'cycles', Event count (approx.): 1114297095227                                   
-  95.25%          python3  [kernel.kallsyms]                           [k] _raw_spin_lock_irqsave   ◆
   - _raw_spin_lock_irqsave                                                                          ▒
      - 95.01% extract_buf                                                                           ▒
           extract_entropy_user                                                                      ▒
           urandom_read                                                                              ▒
           vfs_read                                                                                  ▒
           sys_read                                                                                  ▒
           system_call_fastpath                                                                      ▒
           __GI___libc_read                                                                          ▒
-   2.06%          python3  [kernel.kallsyms]                           [k] sha_transform            ▒
   - sha_transform                                                                                   ▒
      - 2.06% extract_buf                                                                            ▒
           extract_entropy_user                                                                      ▒
           urandom_read                                                                              ▒
           vfs_read                                                                                  ▒
           sys_read                                                                                  ▒
           system_call_fastpath                                                                      ▒
           __GI___libc_read                                                                          ▒
-   0.74%          python3  [kernel.kallsyms]                           [k] _mix_pool_bytes          ▒
   - _mix_pool_bytes                                                                                 ▒
      - 0.74% __mix_pool_bytes                                                                       ▒
           extract_buf                                                                               ▒
           extract_entropy_user                                                                      ▒
           urandom_read                                                                              ▒
           vfs_read                                                                                  ▒
           sys_read                                                                                  ▒
           system_call_fastpath                                                                      ▒
           __GI___libc_read                                                                          ▒
    0.44%          python3  [kernel.kallsyms]                           [k] extract_buf              ▒
    0.15%          python3  python3.4                                   [.] 0x000000000004b055       ▒
    0.10%          python3  [kernel.kallsyms]                           [k] memset                   ▒
    0.09%          python3  [kernel.kallsyms]                           [k] copy_user_generic_string ▒
    0.07%          python3  multiarray.cpython-34m-x86_64-linux-gnu.so  [.] 0x00000000000b4134       ▒
    0.06%          python3  [kernel.kallsyms]                           [k] _raw_spin_unlock_irqresto▒
    0.06%          python3  python3.4                                   [.] PyEval_EvalFrameEx

似乎大部分时间都花在了呼叫_raw_spin_lock_irqsave上.我不知道这是什么意思.

It seems as if most of the time is spent calling _raw_spin_lock_irqsave. I have no idea what this means, though.

运行多个python程序时具有高内核CPU [英] High Kernel CPU when running multiple python programs

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录关闭

运行多个python程序时具有高内核CPU [英] High Kernel CPU when running multiple python programs

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录 关闭

登录关闭