gprof输出不准确 [英] Inaccuracy in gprof output

查看:223
本文介绍了gprof输出不准确的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图使用gprof配置一个c ++函数,我在所需的%时间intrested。我做了一个以上的运行,由于某种原因,我得到了很大的差异的结果。我不知道是什么造成这种,我假设抽样率或我在其他帖子中读取I / O有关它。那么是否有一种方法使它更准确,产生某种程度上几乎不变的结果?

I am trying to profile a c++ function using gprof, I am intrested in the %time taken. I did more than one run and for some reason I got a large difference in the results. I don't know what is causing this, I am assuming the sampling rate or I read in other posts that I/O has something to do with it. So is there a way to make it more accurate and generate somehow almost constant results?

我在想以下内容:


  1. 提高采样率

  2. 在执行任何操作之前刷新缓存

  3. 使用另一个分析器,但我想让它以与函数time%function name类似的格式生成结果尝试Valgrind,但它给了我一个庞大的文件大小。

正在等待您的输入

尊敬

推荐答案

我建议您打印 gprof论文并仔细阅读。

I recommend printing a copy of the gprof paper and reading it carefully.

根据论文,这里是如何gprof测量时间。它对PC进行采样,并计算每个例程中有多少个采样点。乘以样本之间的时间,这是每个例程的总时间

According to the paper, here's how gprof measures time. It samples the PC, and it counts how many samples land in each routine. Multiplied by the time between samples, that is each routine's total self time.

它还通过调用站点记录在表中, times routine A调用例程B,假设例程B被 -pg 选项插入。

It also records in a table, by call site, how many times routine A calls routine B, assuming routine B is instrumented by the -pg option. By summing those up, it can tell how many times routine B was called.

从调用树的底部开始(其中总时间=自身时间),它假定每个例程的每次调用的平均时间是它的总时间除以调用的数量。

Starting from the bottom of the call tree (where total time = self time), it assumes the average time per call of each routine is its total time divided by the number of calls.

然后它工作回到这些例程的每个调用者。每个例程的时间是其平均自身时间加上每个从属例程的平均调用次数乘以下级例程的平均时间。

Then it works back up to each caller of those routines. The time of each routine is its average self time plus the average number of calls to each subordinate routine times the average time of the subordinate routine.

你可以看到,即使递归(在调用图中的周期)不存在,这是如何充满错​​误的可能性,例如关于平均次数和平均呼叫数的假设,以及关于子程序的假设,作者指出。如果有递归,他们基本上会说忘记。

You can see, even if recursions (cycles in the call graph) are not present, how this is fraught with possibilities for errors, such as assumptions about average times and average numbers of calls, and assumptions about subroutines being instrumented, which the authors point out. If there are recursions, they basically say "forget it".

所有这些技术,即使没有问题,问题 - 它的目的是什么?通常,目的是发现瓶颈。根据该文章,它可以帮助人们评估替代实现。这不是发现瓶颈。他们建议看看似乎被称为很多次的例程,或者具有高平均时间的例程。当然,具有低平均累积时间的例程应该被忽略,但是这不会将问题本地化。并且,它完全忽略I / O,好像所有完成的I / O是毫无疑问必要的。

All of this technology, even if it weren't problematic, begs the question - What is it's purpose? Usually, the purpose is "find bottlenecks". According to the paper, it can help people evaluate alternative implementations. That's not finding bottlenecks. They do recommend looking at routines that seem to be called a lot of times, or that have high average times. Certainly routines with low average cumulative time should be ignored, but that doesn't localize the problem very much. And, it completely ignores I/O, as if all I/O that is done is unquestionably necessary.

因此,要尝试回答您的问题,请尝试缩放,并且不希望消除测量中的统计噪声。

So, to try to answer your question, try Zoom, for one, and don't expect to eliminate statistical noise in measurements.

gprof是一个可靠的工具,简单而坚固耐用,但它在开始时的问题仍然存在,在这几十年间,更好的工具出现了。
以下是问题的列表。

gprof is a venerable tool, simple and rugged, but the problems it had in the beginning are still there, and far better tools have come along in the intervening decades. Here's a list of the issues.

这篇关于gprof输出不准确的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆