了解性能报告 [英] Understanding the perf report
问题描述
我一直在从事一些对时间敏感的项目.由于时间上一些意外的尖峰,我不得不更深入一些.
I had been working on some time-sensitive project. Because of some undesired spikes in the timing, I had to go a bit deeper.
场景:
我有一个固定在CPU内核上的内核模块.该CPU内核也在内核引导参数的isolcpus中列出.这是我对cmdline中的内核引导参数所做的
I have a kernel module, which is pinned to a CPU core. This CPU core is also listed in isolcpus in the kernel boot parameters. Here's what I have done to kernel boot parameters in cmdline
intel_iommu=on iommu=pt default_hugepagesz=1G hugepagesz=1G hugepages=1 intel_idle.max_cstate=0 processor.max_cstate=0 nohz_full=7-11 isolcpus=7-11 mce=off rcu_nocbs=7-11 nosoftlockup idle=poll cpuidle.off=1 powersave=off nonmi_ipi nowatchdog
我运行了以下命令(此刻我正在尝试仅分析CPU 8)
I ran the following command ( I am trying to profile just CPU 8 at this moment)
sudo ./perf record -e context-switches -a -g --cpu=8 taskset -c 9 ./test.sh
**编辑1-其他信息**
**EDIT 1 - Additional Information **
内核版本:4.15.12
Kernel Version: 4.15.12
我的内核模块每X个时间单位发送一次同步数据包.目前,我已将其配置为每50毫秒发送一次.
My Kernel Module sends synchronous packets every X time units. Currently, I have configured it to send it every 50ms.
在这种情况下,我简化了test.sh.它需要几个参数,但是关于此脚本的重要一点是它调用了内核模块.
I had simplified test.sh in this case. It takes several parameters, but, an important thing about this script is that it invokes the Kernel module.
例如, 我的KM的处理程序为fs. 在此proc fs上触发写事件时,它将创建一个新的Kthread,将其绑定到CPU(8),并开始每50毫秒生成一个数据包.
For instance, My KM had a proc fs. When a write event is triggered on this proc fs, it creates a new Kthread, binds it to CPU (8), and starts generating packet every 50ms.
为了避免冲突和上下文切换,我已将其移至内核空间.另外,我将脚本的相似性设置为与内核模块不同的CPU.
To avoid collision and context-switches, I had moved this thing to the kernel space. Also, I had set the affinity of my script to a different CPU than the kernel module.
因此,我观察到的是,发送时间中存在一些抖动,可能是由于这些上下文切换造成的.
Thus, what I have observed is, there is a bit of jitter in the sending times, possibly because of these context switches.
这是输入perf report
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 8 of event 'context-switches'
# Event count (approx.): 39
#
# Children Self Command Shared Object Symbol
# ........ ........ ........... ................ .................
#
69.23% 69.23% :-1 [kernel.vmlinux] [k] do_task_dead
|
---do_task_dead
25.64% 25.64% swapper [kernel.vmlinux] [k] schedule_idle
|
---schedule_idle
2.56% 2.56% :2100 [kernel.vmlinux] [k] _cond_resched
|
---_cond_resched
2.56% 2.56% kworker/8:1 [kernel.vmlinux] [k] schedule
|
---schedule
它说有8个上下文切换.另外,我无法理解第一行do_task_dead()
行的Command
列中的:-1实际含义.如果有人能提供我一些指导,让我更深入地研究这个问题,那就太好了.
It says that there have been 8 context-switches. Also, I could not understand what :-1 actually meant in the Command
column of first do_task_dead()
row. It would be great if anyone would provide me some directions in digging deeper into this issue.
编辑2-性能脚本报告和cpu_idle分析结果
swapper 0 [008] 64409.434193: 1 context-switches:
aceea8 schedule_idle (/lib/modules/4.15.12/build/vmlinux)
:-1 -1 [008] 64410.434267: 1 context-switches:
2ac066 do_task_dead (/lib/modules/4.15.12/build/vmlinux)
swapper 0 [008] 64410.442240: 1 context-switches:
aceea8 schedule_idle (/lib/modules/4.15.12/build/vmlinux)
:29026 29026 [008] 64411.442313: 1 context-switches:
acee0d _cond_resched (/lib/modules/4.15.12/build/vmlinux)
kworker/8:1 181 [008] 64411.442318: 1 context-switches:
acebf2 schedule (/lib/modules/4.15.12/build/vmlinux)
:-1 -1 [008] 64411.442327: 1 context-switches:
2ac066 do_task_dead (/lib/modules/4.15.12/build/vmlinux)
swapper 0 [008] 64411.466238: 8 context-switches:
aceea8 schedule_idle (/lib/modules/4.15.12/build/vmlinux)
swapper 0 [008] 64414.538207: 31 context-switches:
aceea8 schedule_idle (/lib/modules/4.15.12/build/vmlinux)
运行power:cpu_idle事件,这是perf脚本的输出
running with power:cpu_idle event, here is the output of perf script
swapper 0 [008] 65787.514565: power:cpu_idle: state=4294967295 cpu_id=8
ad3a2f cpu_idle_poll (/lib/modules/4.15.12/build/vmlinux)
swapper 0 [008] 65788.514653: power:cpu_idle: state=0 cpu_id=8
ad39d0 cpu_idle_poll (/lib/modules/4.15.12/build/vmlinux)
swapper 0 [008] 65788.522618: power:cpu_idle: state=4294967295 cpu_id=8
ad3a2f cpu_idle_poll (/lib/modules/4.15.12/build/vmlinux)
swapper 0 [008] 65789.522693: power:cpu_idle: state=0 cpu_id=8
ad39d0 cpu_idle_poll (/lib/modules/4.15.12/build/vmlinux)
swapper 0 [008] 65789.546577: power:cpu_idle: state=4294967295 cpu_id=8
ad3a2f cpu_idle_poll (/lib/modules/4.15.12/build/vmlinux)
swapper 0 [008] 65790.546648: power:cpu_idle: state=0 cpu_id=8
ad39d0 cpu_idle_poll (/lib/modules/4.15.12/build/vmlinux)
swapper 0 [008] 65790.570574: power:cpu_idle: state=4294967295 cpu_id=8
ad3a2f cpu_idle_poll (/lib/modules/4.15.12/build/vmlinux)
....
和perf report
显示
# Samples: 22 of event 'power:cpu_idle'
# Event count (approx.): 22
#
# Children Self Trace output
# ........ ........ .........................
#
50.00% 50.00% state=0 cpu_id=8
|
---cpu_idle_poll
50.00% 50.00% state=4294967295 cpu_id=8
|
---cpu_idle_poll
谢谢
Coshal.
推荐答案
祖兰非常接近,他的所有建议都应遵循.
Zulan is quite close and all his suggestions should be followed through.
在perf report
的手册页中:
perf report
输出中的命令列是指从中收集样品的过程.在每个线程/每个进程模式下,这始终是受监视命令的名称.但是在cpu范围模式下,命令可能会有所不同.
The command column in the perf report
output refers to the process from which the samples were collected. In per-thread/per-process mode, this is always the name of the monitored command. But in cpu-wide mode, the command can vary.
因为您正在测量per-cpu
上下文切换事件,如从perf record ... -cpu=8...
命令可以看到的,所以perf report命令将开始报告下一个/上一个任务的PID/TID.您可以在这里看到它-
Because you are measuring per-cpu
context switch events as can be seen from your perf record ... -cpu=8...
command, the perf report command will start to report PID/TID of next/prev task. You can see this here -
现在值-1表示已死的进程,即该进程已超出僵尸进程的状态.这意味着task_struct
成员现在指向已经释放的内存,并且不应该取消引用.do_task_dead
方法应该清楚地反映出这一点.
这是该过程将以-1的形式返回,并同时在perf report
中报告.
Now the value -1 refers to a process which is dead, i.e. the process has gone beyond the state of being a zombie process. This means the task_struct
members now point to already freed memory and no dereferencing should be allowed.The do_task_dead
method should clearly reflect this.
This is where the pid of the process is being returned as -1 and simultaneously reported in perf report
.
有广泛的讨论关于这个问题.最初,值0用来表示perf report
输出中的这种过程状态,但是您可能已经猜到了,pid = 0表示了 idle thread (空闲线程),因此使用值-1.
There is an extensive discussion on this problem. Initially a value of 0 used to refer to such a process state in the perf report
output but as you may have guessed, pid=0 refers to the idle thread and hence a value of -1 is used.
这篇关于了解性能报告的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!