为什么 perf 显示 sleep 需要所有核心? [英] Why does perf show that sleep takes all cores?
问题描述
我正在尝试熟悉 perf
并针对我编写的各种程序运行它.
I am trying to familiarize myself with perf
and run it against various programs I wrote.
当我针对 100% 单线程的程序启动它时,perf 显示它在机器上需要两个内核(任务时钟事件).这是示例输出:
When I launch it against program that is 100% single threaded, perf shows that it takes two cores on machine (task-clock event). Here's the example output:
perf stat -a --per-core python3 test.py
Performance counter stats for 'system wide':
S0-C0 1 19004.951263 task-clock (msec) # 1.000 CPUs utilized (100.00%)
S0-C0 1 5,582 context-switches (100.00%)
S0-C0 1 19 cpu-migrations (100.00%)
S0-C0 1 3,746 page-faults
S0-C0 1 <not supported> cycles
S0-C0 1 <not supported> stalled-cycles-frontend
S0-C0 1 <not supported> stalled-cycles-backend
S0-C0 1 <not supported> instructions
S0-C0 1 <not supported> branches
S0-C0 1 <not supported> branch-misses
S0-C1 1 19004.950059 task-clock (msec) # 1.000 CPUs utilized (100.00%)
S0-C1 1 6,752 context-switches (100.00%)
S0-C1 1 25 cpu-migrations (100.00%)
S0-C1 1 935 page-faults
S0-C1 1 <not supported> cycles
S0-C1 1 <not supported> stalled-cycles-frontend
S0-C1 1 <not supported> stalled-cycles-backend
S0-C1 1 <not supported> instructions
S0-C1 1 <not supported> branches
S0-C1 1 <not supported> branch-misses
19.004688019 seconds time elapsed
它甚至显示简单的 sleep
命令在我的计算机上占用了两个内核,我无法解释这一点.我知道操作系统调度程序可以为任何进程重新分配活动核心,但在这种情况下,CPU 利用率会反映这一点.
It even shows that simple sleep
command takes two cores on my computer and I can't explain this. I understand that OS scheduler can reassign active core for any process, but in this case CPU utilization would reflect that.
谁能解释一下?
谢谢!
推荐答案
根据 perf stat
子命令的手册页,你有 -a
选项来分析整个系统:http://man7.org/linux/man-pages/man1/perf-stat.1.html
According to man page of perf stat
subocmmand, you have -a
option to profile full system:
http://man7.org/linux/man-pages/man1/perf-stat.1.html
-a, --all-cpus
system-wide collection from all CPUs (default if no target is
specified)
在这个系统范围"模式perf stat
(和perf record
也是) 将统计系统中所有 CPU 上的事件(或 record
的配置文件).当在没有 command
的附加参数的情况下使用时,perf 将运行直到被 Ctrl-C 中断.使用 command
的参数,perf 将计数/分析,直到命令生效.典型用法是
In this "system-wide" mode perf stat
(and perf record
too) will count events on (or profile for record
) all CPUs in the system. When used without additional argument of command
, perf will run until interrupted by Ctrl-C. With argument of command
, perf will count/profile until the command works. Typical usage is
perf stat -a sleep 10 # Profile counting every CPU for 10 seconds
perf record -a sleep 10 # Profile with cycles every CPU for 10 seconds to perf.data
要获取单个命令的统计信息,请使用单进程分析(不带 -a 选项)
For getting stats of single command use single process profiling (without -a option)
perf stat python3 test.py
对于分析(perf record
),您可以不带 -a 选项运行;或者你可以使用 -a 然后在 perf report
,仅关注应用程序的 pids/tids/dsos(如果配置文件的命令使用一些对其他守护程序的进程间请求来执行大量 CPU 工作,这将非常有用).
For profiling (perf record
) you may run without -a option; or you may use -a and later do some manual filtering in perf report
, focusing only on the pids/tids/dsos of your application (This can be very useful if command to profile uses some interprocess requests to other daemons to do lot of CPU work).
--per-core, -A, -C
选项仅适用于系统范围的 -a
模式.尝试 --per-thread
和 -p pid
附加到进程选项.
--per-core, -A, -C <cpulist>, --per-socket
options are only for system-wide -a
mode. Try --per-thread
with -p pid
attach to process option.
这篇关于为什么 perf 显示 sleep 需要所有核心?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!