运行多个 python 程序时内核 CPU 高 [英] High Kernel CPU when running multiple python programs

查看:20
本文介绍了运行多个 python 程序时内核 CPU 高的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我开发了一个 python 程序来进行繁重的数值计算.我在具有 32 个 Xeon CPU、64GB RAM 和 Ubuntu 14.04 64 位的 linux 机器上运行它.我并行启动具有不同模型参数的多个 python 实例以使用多个进程,而不必担心全局解释器锁 (GIL).当我使用 htop 监控 cpu 利用率时,我看到所有内核都已使用,但大部分时间都是内核使用的.通常,内核时间是用户时间的两倍以上.恐怕系统级别的开销很大,但我无法找到原因.

如何降低高内核 CPU 使用率?

以下是我的一些观察:

  • 无论我运行 10 个作业还是 50 个作业,都会出现这种效果.如果作业少于内核,则不是所有内核都被使用,但内核使用的内核仍然有很高的 CPU 使用率
  • 我使用 .

    即运行 perf record -a -g,然后使用 perf report 读取保存到 perf data 中的分析数据.另请参阅:linux 性能:如何解释和查找热点.

    <小时>

    在您的情况下,高 CPU 使用率是由 /dev/urandom 的竞争引起的——它只允许一个线程从中读取,但多个 Python 进程正在这样做.

    Python 模块 random 仅用于初始化.即:

    $ strace python -c '导入随机;而真:随机.随机()'打开(/dev/urandom",O_RDONLY)= 4读取(4,163636636}"...,2500)= 2500close(4) <---/dev/urandom 关闭

    您也可以通过使用 os.urandomSystemRandom 类来明确要求 /dev/urandom.因此,请检查处理随机数的代码.

    I developed a python program that does heavy numerical calculations. I run it on a linux machine with 32 Xeon CPUs, 64GB RAM, and Ubuntu 14.04 64-bit. I launch multiple python instances with different model parameters in parallel to use multiple processes without having to worry about the global interpreter lock (GIL). When I monitor the cpu utilization using htop, I see that all cores are used, however most of the time by kernel. Generally, the kernel time is more than twice the user time. I'm afraid that there is a lot of overhead going on on the system level, but I'm not able to find the cause for this.

    How would one reduce the high kernel CPU usage?

    Here are some observation I made:

    • This effect appears independent of whether I run 10 jobs or 50. If there are fewer jobs than cores, not all cores are used, but the ones that are used still have a high CPU usage by the kernel
    • I implemented the inner loop using numba, but the problem is not related to this, since removing the numba part does not resolve the problem
    • I also though that it might be related to using python2 similar to the problem mentioned in this SO question but switching from python2 to python3 did not change much
    • I measured the total number of context switches performed by the OS, which is about 10000 per second. I'm not sure whether this is a large number
    • I tried increasing the python time slices by setting sys.setcheckinterval(10000) (for python2) and sys.setswitchinterval(10) (for python3) but none of this helped
    • I tried influencing the task scheduler by running schedtool -B PID but this didn't help

    Edit: Here is a screenshot of htop:

    I also ran perf record -a -g and this is the report by perf report -g graph:

    Samples: 1M of event 'cycles', Event count (approx.): 1114297095227                                   
    -  95.25%          python3  [kernel.kallsyms]                           [k] _raw_spin_lock_irqsave   ◆
       - _raw_spin_lock_irqsave                                                                          ▒
          - 95.01% extract_buf                                                                           ▒
               extract_entropy_user                                                                      ▒
               urandom_read                                                                              ▒
               vfs_read                                                                                  ▒
               sys_read                                                                                  ▒
               system_call_fastpath                                                                      ▒
               __GI___libc_read                                                                          ▒
    -   2.06%          python3  [kernel.kallsyms]                           [k] sha_transform            ▒
       - sha_transform                                                                                   ▒
          - 2.06% extract_buf                                                                            ▒
               extract_entropy_user                                                                      ▒
               urandom_read                                                                              ▒
               vfs_read                                                                                  ▒
               sys_read                                                                                  ▒
               system_call_fastpath                                                                      ▒
               __GI___libc_read                                                                          ▒
    -   0.74%          python3  [kernel.kallsyms]                           [k] _mix_pool_bytes          ▒
       - _mix_pool_bytes                                                                                 ▒
          - 0.74% __mix_pool_bytes                                                                       ▒
               extract_buf                                                                               ▒
               extract_entropy_user                                                                      ▒
               urandom_read                                                                              ▒
               vfs_read                                                                                  ▒
               sys_read                                                                                  ▒
               system_call_fastpath                                                                      ▒
               __GI___libc_read                                                                          ▒
        0.44%          python3  [kernel.kallsyms]                           [k] extract_buf              ▒
        0.15%          python3  python3.4                                   [.] 0x000000000004b055       ▒
        0.10%          python3  [kernel.kallsyms]                           [k] memset                   ▒
        0.09%          python3  [kernel.kallsyms]                           [k] copy_user_generic_string ▒
        0.07%          python3  multiarray.cpython-34m-x86_64-linux-gnu.so  [.] 0x00000000000b4134       ▒
        0.06%          python3  [kernel.kallsyms]                           [k] _raw_spin_unlock_irqresto▒
        0.06%          python3  python3.4                                   [.] PyEval_EvalFrameEx       
    

    It seems as if most of the time is spent calling _raw_spin_lock_irqsave. I have no idea what this means, though.

    解决方案

    If the problem exists in kernel, you should narrow down a problem using a profiler such as OProfile or perf.

    I.e. run perf record -a -g and than read profiling data saved into perf data using perf report. See also: linux perf: how to interpret and find hotspots.


    In your case high CPU usage is caused by competition for /dev/urandom -- it allows only one thread to read from it, but multiple Python processes are doing so.

    Python module random is using it only for initialization. I.e:

    $ strace python -c 'import random;
    while True:
        random.random()'
    open("/dev/urandom", O_RDONLY)     = 4
    read(4, "163636636}"..., 2500) = 2500
    close(4)                                   <--- /dev/urandom is closed
    

    You may also explicitly ask for /dev/urandom by using os.urandom or SystemRandom class. So check your code which is dealing with random numbers.

    这篇关于运行多个 python 程序时内核 CPU 高的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆