为什么 perf 显示 sleep 需要所有核心? [英] Why does perf show that sleep takes all cores?

查看:228
本文介绍了为什么 perf 显示 sleep 需要所有核心?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试熟悉 perf 并针对我编写的各种程序运行它.

I am trying to familiarize myself with perf and run it against various programs I wrote.

当我针对 100% 单线程的程序启动它时,perf 显示它在机器上需要两个内核(任务时钟事件).这是示例输出:

When I launch it against program that is 100% single threaded, perf shows that it takes two cores on machine (task-clock event). Here's the example output:

perf stat  -a --per-core python3 test.py

Performance counter stats for 'system wide':

    S0-C0           1       19004.951263      task-clock (msec) # 1.000 CPUs utilized            (100.00%)
    S0-C0           1              5,582      context-switches                                              (100.00%)
    S0-C0           1                 19      cpu-migrations                                                (100.00%)
    S0-C0           1              3,746      page-faults                                                 
    S0-C0           1    <not supported>      cycles                   
    S0-C0           1    <not supported>      stalled-cycles-frontend  
    S0-C0           1    <not supported>      stalled-cycles-backend   
    S0-C0           1    <not supported>      instructions             
    S0-C0           1    <not supported>      branches                 
    S0-C0           1    <not supported>      branch-misses            
    S0-C1           1       19004.950059      task-clock (msec) # 1.000 CPUs utilized            (100.00%)
    S0-C1           1              6,752      context-switches                                              (100.00%)
    S0-C1           1                 25      cpu-migrations                                                (100.00%)
    S0-C1           1                935      page-faults                                                 
    S0-C1           1    <not supported>      cycles                   
    S0-C1           1    <not supported>      stalled-cycles-frontend  
    S0-C1           1    <not supported>      stalled-cycles-backend   
    S0-C1           1    <not supported>      instructions             
    S0-C1           1    <not supported>      branches                 
    S0-C1           1    <not supported>      branch-misses            

      19.004688019 seconds time elapsed

它甚至显示简单的 sleep 命令在我的计算机上占用了两个内核,我无法解释这一点.我知道操作系统调度程序可以为任何进程重新分配活动核心,但在这种情况下,CPU 利用率会反映这一点.

It even shows that simple sleep command takes two cores on my computer and I can't explain this. I understand that OS scheduler can reassign active core for any process, but in this case CPU utilization would reflect that.

谁能解释一下?

谢谢!

推荐答案

根据 perf stat 子命令的手册页,你有 -a 选项来分析整个系统:http://man7.org/linux/man-pages/man1/perf-stat.1.html

According to man page of perf stat subocmmand, you have -a option to profile full system: http://man7.org/linux/man-pages/man1/perf-stat.1.html

   -a, --all-cpus
       system-wide collection from all CPUs (default if no target is
       specified)

在这个系统范围"模式perf stat(和perf record 也是) 将统计系统中所有 CPU 上的事件(或 record 的配置文件).当在没有 command 的附加参数的情况下使用时,perf 将运行直到被 Ctrl-C 中断.使用 command 的参数,perf 将计数/分析,直到命令生效.典型用法是

In this "system-wide" mode perf stat (and perf record too) will count events on (or profile for record) all CPUs in the system. When used without additional argument of command, perf will run until interrupted by Ctrl-C. With argument of command, perf will count/profile until the command works. Typical usage is

perf stat -a sleep 10      # Profile counting every CPU for 10 seconds
perf record -a sleep 10    # Profile with cycles every CPU for 10 seconds to perf.data

要获取单个命令的统计信息,请使用单进程分析(不带 -a 选项)

For getting stats of single command use single process profiling (without -a option)

perf stat python3 test.py

对于分析(perf record),您可以不带 -a 选项运行;或者你可以使用 -a 然后在 perf report,仅关注应用程序的 pids/tids/dsos(如果配置文件的命令使用一些对其他守护程序的进程间请求来执行大量 CPU 工作,这将非常有用).

For profiling (perf record) you may run without -a option; or you may use -a and later do some manual filtering in perf report, focusing only on the pids/tids/dsos of your application (This can be very useful if command to profile uses some interprocess requests to other daemons to do lot of CPU work).

--per-core, -A, -C , --per-socket 选项仅适用于系统范围的 -a 模式.尝试 --per-thread-p pid 附加到进程选项.

--per-core, -A, -C <cpulist>, --per-socket options are only for system-wide -a mode. Try --per-thread with -p pid attach to process option.

这篇关于为什么 perf 显示 sleep 需要所有核心?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆