在“perf record"的输出上运行“perf stat"? [英] run `perf stat` on the output of `perf record`?

查看:148
本文介绍了在“perf record"的输出上运行“perf stat"?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用 perf(Linux 分析器), (v4.15.18), 我可以运行 perf stat $COMMAND 来获取一些简单的命令统计信息.如果我运行 perf record,它会将大量数据保存到 perf.data 文件中.

我可以在 perf record 的输出上运行 perf stat 吗?这样我就可以查看 perf 记录的数据,同时也可以得到一个简单的概览?

解决方案

perf stat 使用 硬件性能监控单元在计数模式下,perf record/perf report与perf.data文件在溢出模式下使用相同的单位.在这两种模式下,硬件性能计数器都使用控制寄存器配置为某种性能事件(例如cpu 周期或执行的指令),并且计数器将在每个事件上递增.

在计数模式下 perf stat 将在程序启动时将计数器配置为零,并在程序退出时读取最终的计数器值(实际上计数可能会分成几个具有相同结果的段 - 单个值表示完整运行).

在分析模式下(采样分析)perf 记录 将计数器配置为某个负值,例如 -100000 并安装溢出处理程序(实际值将自动调整为某个频率).每 100000 个事件计数器将溢出为零并产生中断.perf_events 中断处理程序将记录样本"(当前时间、pid、指令指针、-g 中可选的调用堆栈)到环形缓冲区中,该缓冲区将保存到 perf.data 中.此处理程序还将再次将计数器重置为 -100000.因此,运行足够长的时间后,perf.data 中将存储数千个样本,可用于生成程序的统计配置文件(程序的哪些部分运行得更频繁).>

perf stat 显示什么?x86_64 cpu的默认模式:程序运行时间(任务时钟和已用时间),3个软件事件(上下文切换,cpu迁移,页面错误),4个硬件计数器:周期,指令,分支,分支未命中:

$ echo '3^123456%3' |性能统计 bc0bc"的性能计数器统计信息:325.604672 任务时钟(毫秒)# 0.998 个 CPU 使用0 次上下文切换 # 0.000 K/秒0 cpu-migrations # 0.000 K/秒181 个页面错误 # 0.556 K/秒828,234,675 个周期 # 2.544 GHz1,840,146,399 条指令 # 每个周期 2.22 insn348,965,282 个分支 # 1071.745 M/秒15,385,371 次分支未命中 # 占所有分支的 4.41%0.326152702 秒时间过去

记录 perf record 是什么?在单个唤醒事件(环形缓冲区溢出)中,它确实将 1246 个样本保存到 perf.data 中,并且使用了默认的硬件事件(循环)

$ echo '3^123456%3' |性能记录 BC[性能记录:唤醒1次写入数据][性能记录:捕获并写入 0.049 MB perf.data(1293 个样本)]

使用 perf report --header|lessperf scriptperf script -D,您可以查看 perf.data内容:

$ perf report --header |grep 事件# 事件:名称 = 周期:uppp, , 大小 = 112, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|PERIOD ...# 样本:1K 事件cycles:uppp"$ perf 脚本 2>/dev/null |grep 周期|wc -l1293

perf.data 里面有一些时间戳和程序启动和退出的一些附加事件(perf script -D |egrep exec\|EXIT),但是默认perf.data 以完全重建 perf stat 输出.运行时间仅记录为开始和退出的时间戳,并且对于每个事件样本,不记录软件事件,仅使用单个硬件事件(周期;无指令、分支、分支未命中).可用硬件计数器的近似值可以完成,但它并不准确(实际周期约为 820-825 百万):

$ perf report --header |grep 事件# 事件计数(大约):836622729

使用 perf.data 的非默认记录可以估计更多事件:

$ echo '3^123456%3' |perf record -e 周期、说明、分支、分支未命中 bc[性能记录:捕获并写入 0.238 MB perf.data(5164 个样本)]$ perf report --header |egrep 事件\|示例# 样本:1K 的事件周期"# 事件计数(大约):834809036# 样本:1K 的事件指令"# 事件计数(大约):1834083643# 样本:1K 的事件分支"# 事件计数(大约):347750459# 样本:1K 事件分支未命中"# 事件计数(大约):15382047

因此,您无法在 perf.data 上运行 perf stat文件,但您可以要求 perf report 打印带有事件计数估计的标题.您也可以尝试从 perf script/perf script -D 解析时间戳.

With perf (the Linux profiler), (v4.15.18), I can run perf stat $COMMAND to get some simple stats on the command. If I run perf record, it saves lots of data to a perf.data file.

Can I run perf stat on the output of perf record? So that I can look at the perf recorded data, but also get a simple overview?

解决方案

perf stat uses hardware performance monitoring unit in counting mode, and perf record/perf report with perf.data file uses the same unit in overflow mode. In both modes hardware performance counters are configured with control register into some kind of performance events (for example cpu cycles or instructions executed), and counters will be incremented on every event.

In counting mode perf stat will configure counters as zero at program start, and will read final counter value at program exit (actually counting may be split in several segments with same result - single value for full run).

In profiling mode (sampling profiling) perf record will configure counter to some negative value, for example -100000 and overflow handler will be installed (actual value will be autotuned into some frequency). Every 100000 events the counter will overflow into zero and generate an interrupt. perf_events interrupt handler will record the "sample" (current time, pid, instruction pointer, optionally callstack in -g) into ring buffer which will be saved into perf.data. This handler will also reset the counter into -100000 again. So, after long enough run there will be thousands of samples to be stored in perf.data, which can be used to generate statistical profile of program (which parts of program did run more often).

What does perf stat show? In default mode for x86_64 cpu: running time of the program (task-clock and elapsed), 3 software events (context switch, cpu migration, page fault), 4 hardware counters: cycles, instructions, branches, branch-misses:

$ echo '3^123456%3' | perf stat bc
0
 Performance counter stats for 'bc':
        325.604672      task-clock (msec)         #    0.998 CPUs utilized          
                 0      context-switches          #    0.000 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
               181      page-faults               #    0.556 K/sec                  
       828,234,675      cycles                    #    2.544 GHz                    
     1,840,146,399      instructions              #    2.22  insn per cycle         
       348,965,282      branches                  # 1071.745 M/sec                  
        15,385,371      branch-misses             #    4.41% of all branches        
       0.326152702 seconds time elapsed

What does record perf record? In single wake up event (ring buffer overflow) it did save 1246 samples into perf.data, and default hw event was used (cycles)

$ echo '3^123456%3' | perf record bc
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.049 MB perf.data (1293 samples) ]

With perf report --header|less, perf script and perf script -D you can take a look into the perf.data content:

$ perf report --header |grep event
# event : name = cycles:uppp, , size = 112, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|PERIOD ...
# Samples: 1K of event 'cycles:uppp'
$ perf script 2>/dev/null |grep cycles|wc -l 
1293

There are some timestamps inside perf.data and some additional events for program start and exit (perf script -D |egrep exec\|EXIT), but there is no enough information in default perf.data to fully reconstruct perf stat output. Running time is recorded only as timestamps of start and exit, and of every event sample, software events are not recorded, only single hardware event was used (cycles; no instructions, branches, branch-misses). Approximation of used hardware counter can be done, but it is not exact (real cycles was around 820-825 mln):

$ perf report --header |grep Event
# Event count (approx.): 836622729

With non-default recording of perf.data more events can be estimated:

$ echo '3^123456%3' | perf record -e cycles,instructions,branches,branch-misses bc
[ perf record: Captured and wrote 0.238 MB perf.data (5164 samples) ]
$ perf report --header |egrep Event\|Samples
# Samples: 1K of event 'cycles'
# Event count (approx.): 834809036
# Samples: 1K of event 'instructions'
# Event count (approx.): 1834083643
# Samples: 1K of event 'branches'
# Event count (approx.): 347750459
# Samples: 1K of event 'branch-misses'
# Event count (approx.): 15382047

So, you can't run perf stat on perf.data file, but you can ask perf report to print the header with event count estimation. You also can try to parse timestamps from perf script/perf script -D.

这篇关于在“perf record"的输出上运行“perf stat"?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆