探查器如何对正在运行的程序进行采样? [英] How does a profiler sample a running programe?

查看:157
本文介绍了探查器如何对正在运行的程序进行采样?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对此只有一些粗略的想法,所以我想有一些更多的实践想法.欢迎使用Linux,Unix和Windows的创意.

我脑海中的粗略想法是:

探查器在目标进程中设置了某种类型的计时器和计时器中断处理程序.当其处理程序取得控制权时,它将读取并保存指令指针寄存器的值.采样完成后,它将对每个IP寄存器值的出现进行计数,然后我们就可以知道所有采样的程序地址中的头号人物".

但是我实际上并不知道该怎么做.有人可以给我一些基本但实用的想法吗?例如,始终使用哪种计时器(或等效计时器)?如何读取IP reg值?等(我认为当执行进入探查器的处理程序例程时,IP应该指向处理程序的入口,而不是指向目标程序中的某个位置,因此我们无法简单地读取当前IP值)

谢谢您的回答!


感谢彼得·科德斯(Peter Cordes)和迈克·邓拉维(Mike Dunlavey)的回答.

Peter的答案告诉您如何读取其他进程的寄存器和内存.现在我意识到,探查器不必在目标进程的内部"执行,而是只需使用ptrace(2)从外部读取目标的reg/mem.甚至不必挂起目标,因为ptrace无论如何都会这样做.

迈克的答案表明,对于性能分析,计数堆栈跟踪的发生比计数IP寄存器的值更有意义,因为当采样时在系统模块中执行时,后者可能会给出过多的噪声信息.

非常感谢你们!

解决方案

对您很有帮助.建议- 不要尝试模仿 gprof .. >

您需要做的是在随机或伪随机时间对调用堆栈(不仅是IP)进行采样.

  • 第一个原因-I/O和系统调用可能会深深地埋在应用程序中,并且花费大量时间,在此期间IP没有意义,但是堆栈很有意义. ("CPU分析器"只是闭上了眼睛.)

  • 第二个原因-查看IP就像试图通过查看马尾上的毛发来理解马.要分析程序的性能,您需要知道为什么要花费时间,而不仅仅是花时间.堆栈告诉为什么.

gprof 的另一个问题是,人们认为您需要大量样本-越多越好-为统计精度. 但这假设您正在大海捞针中寻找针头,将其清除几乎可以节省任何东西-换句话说,您假设(attaboy/girl程序员)那里没有什么 big ,就像下面的一头母牛干草. 好吧,我从未见过没有的软件,并且不需要很多样本就能找到它们.

如何获取样本:具有计时器中断并读取堆栈(二进制)只是一个技术问题.很久以前,我就想出了怎么做.你也可以每个调试器都这样做.但是要将其转换为代码名称和位置需要一个映射文件或类似的文件,这通常意味着调试版本(未优化).您可以从优化的代码中获取映射文件,但是优化器已经对代码进行了加扰,因此很难理解.

是否值得用未经优化的代码进行采样?我认为是这样,因为有两种加速方式,一种是编译器可以完成的,另一种是您可以但是编译器却做不到的.后者是母牛. 因此,我和其他许多程序员首先要做的是使用随机采样对未优化的代码进行性能调整.当所有问题都消散之后,打开优化器,让编译器发挥其魔力.

I only have some rough idea about this, so I would like to have some more practicle ideas. Ideas for Linux, Unix, and Windows are all welcome.

The rough Idea in my head is:

The profiler setup some type of timer and a timer interrupt handler in the target process. When its handler takes control, it reads and saves the value of the instruction pointer register. When the sampling is done, it counts the occurences of every IP register value, then we can know the 'top hitters' among all sampled programe addresses.

But I do not actually know how to do it. Can someone give me some basic but practicle ideas of it? For example, what kind of timer (or equivalent) is always used? How to read the IP reg value? and etc. (I think when the execution enters the profiler's handler routine, the IP should be pointing the entrence of the handler, not to somewhere in the target program, so we cannot simplu read the current IP value)

Thank you for your answer!


Thanks for the answers from Peter Cordes and Mike Dunlavey.

Peter's answer tells how to read registers and memory of other process. Now I realized that the profiler does not have to execute 'inside' the target process, instead, it just reads the target's reg/mem using ptrace(2) from outside. It even does not have to suspend the target as the ptrace would do it anyway.

Mike's answer suggests that, for performance profiling, counting the occurrences of stack trace makes more sense than counting aginst the IP register values, as the latter may give too much noise information when the execution is in system module at the moment of sampling.

Thank you guys so much!

解决方案

Good for you for wanting to do this. Advice - don't try to mimic gprof.

What you need to do is sample the call stack, not just the IP, at random or pseudo-random times.

  • First reason - I/O and system calls can be deeply buried in the app and be costing a large fraction of the time, during which the IP is meaningless but the stack is meaningful. ("CPU profilers" simply shut their eyes.)

  • Second reason - Looking at the IP is like trying to understand a horse by looking at the hairs on its tail. To analyze performance of a program you need to know why the time is spent, not just that it is. The stack tells why.

Another problem with gprof is it made people think you need lots of samples - the more the better - for statistical precision. But that assumes you're looking for needles in a haystack, the removal of which saves next to nothing - in other words you assume (attaboy/girl programmer) there's nothing big in there, like a cow under the hay. Well, I've never seen software that didn't have cows in the hay, and it doesn't take a lot of samples to find them.

How to get samples: having a timer interrupt and reading the stack (in binary) is just a technical problem. I figured out how to do it a long time ago. So can you. Every debugger does it. But to turn it into code names and locations requires a map file or something like it, which usually means a debug build (not optimized). You can get a map file from optimized code, but the optimizer has scrambled the code so it's hard to make sense of.

Is it worthwhile taking samples in non-optimized code? I think so, because there are two kinds of speedups, the ones the compiler can do, and the ones you can do but the compiler can't. The latter are the cows. So what I and many other programmers do first is performance tuning on un-optimized code using random sampling. When all the cows are out, turn on the optimizer and let the compiler do its magic.

这篇关于探查器如何对正在运行的程序进行采样?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆