在“perf record"的输出上运行“perf stat"? [英] run `perf stat` on the output of `perf record`?
问题描述
使用 perf
(Linux 分析器), (v4.15.18), 我可以运行 perf stat $COMMAND
来获取一些简单的命令统计信息.如果我运行 perf record
,它会将大量数据保存到 perf.data
文件中.
我可以在 perf record
的输出上运行 perf stat
吗?这样我就可以查看 perf 记录的数据,同时也可以得到一个简单的概览?
perf stat
使用 硬件性能监控单元在计数模式下,perf record
/perf report
与perf.data文件在溢出模式下使用相同的单位.在这两种模式下,硬件性能计数器都使用控制寄存器配置为某种性能事件(例如cpu 周期或执行的指令),并且计数器将在每个事件上递增.
在计数模式下 perf stat
将在程序启动时将计数器配置为零,并在程序退出时读取最终的计数器值(实际上计数可能会分成几个具有相同结果的段 - 单个值表示完整运行).
在分析模式下(采样分析)perf 记录
将计数器配置为某个负值,例如 -100000
并安装溢出处理程序(实际值将自动调整为某个频率).每 100000 个事件计数器将溢出为零并产生中断.perf_events
中断处理程序将记录样本"(当前时间、pid、指令指针、-g
中可选的调用堆栈)到环形缓冲区中,该缓冲区将保存到 perf.data
中.此处理程序还将再次将计数器重置为 -100000
.因此,运行足够长的时间后,perf.data
中将存储数千个样本,可用于生成程序的统计配置文件(程序的哪些部分运行得更频繁).>
perf stat
显示什么?x86_64 cpu的默认模式:程序运行时间(任务时钟和已用时间),3个软件事件(上下文切换,cpu迁移,页面错误),4个硬件计数器:周期,指令,分支,分支未命中:>
$ echo '3^123456%3' |性能统计 bc0bc"的性能计数器统计信息:325.604672 任务时钟(毫秒)# 0.998 个 CPU 使用0 次上下文切换 # 0.000 K/秒0 cpu-migrations # 0.000 K/秒181 个页面错误 # 0.556 K/秒828,234,675 个周期 # 2.544 GHz1,840,146,399 条指令 # 每个周期 2.22 insn348,965,282 个分支 # 1071.745 M/秒15,385,371 次分支未命中 # 占所有分支的 4.41%0.326152702 秒时间过去
记录 perf record
是什么?在单个唤醒事件(环形缓冲区溢出)中,它确实将 1246 个样本保存到 perf.data 中,并且使用了默认的硬件事件(循环)
$ echo '3^123456%3' |性能记录 BC[性能记录:唤醒1次写入数据][性能记录:捕获并写入 0.049 MB perf.data(1293 个样本)]
使用 perf report --header|less
、perf script
和 perf script -D
,您可以查看 perf.data内容:
$ perf report --header |grep 事件# 事件:名称 = 周期:uppp, , 大小 = 112, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|PERIOD ...# 样本:1K 事件cycles:uppp"$ perf 脚本 2>/dev/null |grep 周期|wc -l1293
perf.data 里面有一些时间戳和程序启动和退出的一些附加事件(perf script -D |egrep exec\|EXIT
),但是默认perf.data
以完全重建 perf stat
输出.运行时间仅记录为开始和退出的时间戳,并且对于每个事件样本,不记录软件事件,仅使用单个硬件事件(周期;无指令、分支、分支未命中).可用硬件计数器的近似值可以完成,但它并不准确(实际周期约为 820-825 百万):
$ perf report --header |grep 事件# 事件计数(大约):836622729
使用 perf.data
的非默认记录可以估计更多事件:
$ echo '3^123456%3' |perf record -e 周期、说明、分支、分支未命中 bc[性能记录:捕获并写入 0.238 MB perf.data(5164 个样本)]$ perf report --header |egrep 事件\|示例# 样本:1K 的事件周期"# 事件计数(大约):834809036# 样本:1K 的事件指令"# 事件计数(大约):1834083643# 样本:1K 的事件分支"# 事件计数(大约):347750459# 样本:1K 事件分支未命中"# 事件计数(大约):15382047
因此,您无法在 perf.data
上运行 perf stat
文件,但您可以要求 perf report
打印带有事件计数估计的标题.您也可以尝试从 perf script
/perf script -D
解析时间戳.
With perf
(the Linux profiler), (v4.15.18), I can run perf stat $COMMAND
to get some simple stats on the command. If I run perf record
, it saves lots of data to a perf.data
file.
Can I run perf stat
on the output of perf record
? So that I can look at the perf recorded data, but also get a simple overview?
perf stat
uses hardware performance monitoring unit in counting mode, and perf record
/perf report
with perf.data file uses the same unit in overflow mode. In both modes hardware performance counters are configured with control register into some kind of performance events (for example cpu cycles or instructions executed), and counters will be incremented on every event.
In counting mode perf stat
will configure counters as zero at program start, and will read final counter value at program exit (actually counting may be split in several segments with same result - single value for full run).
In profiling mode (sampling profiling) perf record
will configure counter to some negative value, for example -100000
and overflow handler will be installed (actual value will be autotuned into some frequency). Every 100000 events the counter will overflow into zero and generate an interrupt. perf_events
interrupt handler will record the "sample" (current time, pid, instruction pointer, optionally callstack in -g
) into ring buffer which will be saved into perf.data
. This handler will also reset the counter into -100000
again. So, after long enough run there will be thousands of samples to be stored in perf.data
, which can be used to generate statistical profile of program (which parts of program did run more often).
What does perf stat
show? In default mode for x86_64 cpu: running time of the program (task-clock and elapsed), 3 software events (context switch, cpu migration, page fault), 4 hardware counters: cycles, instructions, branches, branch-misses:
$ echo '3^123456%3' | perf stat bc
0
Performance counter stats for 'bc':
325.604672 task-clock (msec) # 0.998 CPUs utilized
0 context-switches # 0.000 K/sec
0 cpu-migrations # 0.000 K/sec
181 page-faults # 0.556 K/sec
828,234,675 cycles # 2.544 GHz
1,840,146,399 instructions # 2.22 insn per cycle
348,965,282 branches # 1071.745 M/sec
15,385,371 branch-misses # 4.41% of all branches
0.326152702 seconds time elapsed
What does record perf record
? In single wake up event (ring buffer overflow) it did save 1246 samples into perf.data, and default hw event was used (cycles)
$ echo '3^123456%3' | perf record bc
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.049 MB perf.data (1293 samples) ]
With perf report --header|less
, perf script
and perf script -D
you can take a look into the perf.data content:
$ perf report --header |grep event
# event : name = cycles:uppp, , size = 112, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|PERIOD ...
# Samples: 1K of event 'cycles:uppp'
$ perf script 2>/dev/null |grep cycles|wc -l
1293
There are some timestamps inside perf.data and some additional events for program start and exit (perf script -D |egrep exec\|EXIT
), but there is no enough information in default perf.data
to fully reconstruct perf stat
output. Running time is recorded only as timestamps of start and exit, and of every event sample, software events are not recorded, only single hardware event was used (cycles; no instructions, branches, branch-misses). Approximation of used hardware counter can be done, but it is not exact (real cycles was around 820-825 mln):
$ perf report --header |grep Event
# Event count (approx.): 836622729
With non-default recording of perf.data
more events can be estimated:
$ echo '3^123456%3' | perf record -e cycles,instructions,branches,branch-misses bc
[ perf record: Captured and wrote 0.238 MB perf.data (5164 samples) ]
$ perf report --header |egrep Event\|Samples
# Samples: 1K of event 'cycles'
# Event count (approx.): 834809036
# Samples: 1K of event 'instructions'
# Event count (approx.): 1834083643
# Samples: 1K of event 'branches'
# Event count (approx.): 347750459
# Samples: 1K of event 'branch-misses'
# Event count (approx.): 15382047
So, you can't run perf stat
on perf.data
file, but you can ask perf report
to print the header with event count estimation. You also can try to parse timestamps from perf script
/perf script -D
.
这篇关于在“perf record"的输出上运行“perf stat"?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!