如果探查器不是答案,我们还有其他选择吗? [英] If profiler is not the answer, what other choices do we have?

查看:112
本文介绍了如果探查器不是答案,我们还有其他选择吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在观看Joshua Bloch的演出表现焦虑后,我阅读了他在演讲中提出的论文评估Java Pro fi lers的准确性。引用结论:

After watching the presentation "Performance Anxiety" of Joshua Bloch, I read the paper he suggested in the presentation "Evaluating the Accuracy of Java Profilers". Quoting the conclusion:


我们的结果令人不安,因为它们表明在我们的七个基准和两个生产中的大多数基准测试中,普遍存在的错误是普遍存在的JVM - 并且显着 - 所有四个
最先进的专业人员都会产生不正确的专业文件。不正确的
配置文件很容易导致性能分析师花时间优化对性能影响最小的冷​​方法。
我们证明,不使用收益
点进行抽样的概念证明问题不会出现上述问题

Our results are disturbing because they indicate that profiler incorrectness is pervasive—occurring in most of our seven benchmarks and in two production JVM—-and significant—all four of the state-of-the-art profilers produce incorrect profiles. Incorrect profiles can easily cause a performance analyst to spend time optimizing cold methods that will have minimal effect on performance. We show that a proof-of-concept profiler that does not use yield points for sampling does not suffer from the above problems

论文的结论是我们无法真正相信剖析器的结果。但是,使用分析器的替代方法是什么。我们应该回过头来利用我们的感觉进行优化吗?

The conclusion of the paper is that we cannot really believe the result of profilers. But then, what is the alternative of using profilers. Should we go back and just use our feeling to do optimization?

更新:讨论中似乎错过的一点是观察员效应。我们可以构建一个真正观察者效果的探查器 - 免费吗?

UPDATE: A point that seems to be missed in the discussion is observer effect. Can we build a profiler that really 'observer effect'-free?

推荐答案

哦,伙计,从哪里开始?

Oh, man, where to begin?

首先,我很惊讶这是新闻。其次,问题不在于分析器是坏的,而是某些分析器是坏的。
作者构建了一个他们觉得很好的,只是避免了他们在评估中发现的一些错误。
由于一些持久的关于性能分析的神话,错误很常见。

First, I'm amazed that this is news. Second, the problem is not that profilers are bad, it is that some profilers are bad. The authors built one that, they feel, is good, just by avoiding some of the mistakes they found in the ones they evaluated. Mistakes are common because of some persistent myths about performance profiling.

但是让我们肯定。
如果有人想找到加速的机会,那真的很简单:

But let's be positive. If one wants to find opportunities for speedup, it is really very simple:


  • 采样应该是与程序状态不相关的

    这意味着在真正的随机时间发生,无论程序是在I / O中(用户输入除外),还是在GC中,或者在一个紧凑的CPU循环中,或者其他什么。

  • Sampling should be uncorrelated with the state of the program.
    That means happening at a truly random time, regardless of whether the program is in I/O (except for user input), or in GC, or in a tight CPU loop, or whatever.

采样应该读取函数调用堆栈

so至于确定哪些陈述在样本时有效。
原因是每个调用站点(调用函数的点)的百分比成本等于它在堆栈上的时间部分。
(注意:本文完全关注自我时间,忽略了大型软件中可避免函数调用的巨大影响。事实上,原因 gprof 是为了帮助找到这些电话。)

Sampling should read the function call stack,
so as to determine which statements were "active" at the time of the sample. The reason is that every call site (point at which a function is called) has a percentage cost equal to the fraction of time it is on the stack. (Note: the paper is concerned entirely with self-time, ignoring the massive impact of avoidable function calls in large software. In fact, the reason behind the original gprof was to help find those calls.)

报告应按行显示百分比(不是按功能)。

如果识别出热功能,则仍然需要在其内部寻找代表时间的热代码行。该信息在样本!为什么要隐藏呢?

Reporting should show percent by line (not by function).
If a "hot" function is identified, one still has to hunt inside it for the "hot" lines of code accounting for the time. That information is in the samples! Why hide it?

一个几乎普遍存在的错误(论文共享)太过于关注准确性测量,但位置的准确性不够。
例如,这里有一个性能调优的例子
,其中发现并修复了一系列性能问题,导致复合加速43次。
在修复每个问题之前,确切知道每个问题的大小并不是必需的,但要知道它的位置。
性能调优现象是通过减少时间来解决一个问题,放大剩余问题的百分比,这样就更容易找到。
只要找到并修复了任何问题,就会朝着找到和解决所有问题的目标前进。
按尺寸递减顺序修复它们并不是必需的,但必须确定它们。

An almost universal mistake (that the paper shares) is to be concerned too much with accuracy of measurement, and not enough with accuracy of location. For example, here is an example of performance tuning in which a series of performance problems were identified and fixed, resulting in a compounded speedup of 43 times. It was not essential to know precisely the size of each problem before fixing it, but to know its location. A phenomenon of performance tuning is that fixing one problem, by reducing the time, magnifies the percentages of remaining problems, so they are easier to find. As long as any problem is found and fixed, progress is made toward the goal of finding and fixing all the problems. It is not essential to fix them in decreasing size order, but it is essential to pinpoint them.

关于统计测量精度的问题,如果a呼叫点在堆栈的某个百分比时间F(如20%),并且采用N(如100)随机时间样本,则显示呼叫点的样本数是二项分布,均值= NF = 20,标准偏差= sqrt(NF(1-F))= sqrt(16)= 4.因此显示它的样品百分比为20%+ / - 4%。
那么准确吗?不是真的,但是发现了问题?确切地说。

On the subject of statistical accuracy of measurement, if a call point is on the stack some percent of time F (like 20%), and N (like 100) random-time samples are taken, then the number of samples that show the call point is a binomial distribution, with mean = NF = 20, standard deviation = sqrt(NF(1-F)) = sqrt(16) = 4. So the percent of samples that show it will be 20% +/- 4%. So is that accurate? Not really, but has the problem been found? Precisely.

事实上,问题越大,就百分比而言,找到它所需的样本越少。例如,如果采集3个样本,并且其中2个呼叫点出现,则很可能是非常昂贵的。
(具体来说,它遵循beta分布。如果你生成4个统一的0,1个随机数,并对它们进行排序,那么第3个的分布是该呼叫点的成本分配。
它的意思是是(2 + 1)/(3 + 2)= 0.6,这是给定样本的预期节省。)
INSERTED:你得到的加速因子由另一个分布控制, BetaPrime 平均值为4.因此,如果您采取3个样本,看到其中2个有问题,并消除了这个问题,平均而言你会使程序加快四倍。

In fact, the larger a problem is, in terms of percent, the fewer samples are needed to locate it. For example, if 3 samples are taken, and a call point shows up on 2 of them, it is highly likely to be very costly. (Specifically, it follows a beta distribution. If you generate 4 uniform 0,1 random numbers, and sort them, the distribution of the 3rd one is the distribution of cost for that call point. It's mean is (2+1)/(3+2) = 0.6, so that is the expected savings, given those samples.) INSERTED: And the speedup factor you get is governed by another distribution, BetaPrime, and its average is 4. So if you take 3 samples, see a problem on 2 of them, and eliminate that problem, on average you will make the program four times faster.

现在是时候我们程序员把蜘蛛网炸掉了关于分析的主题。

It's high time we programmers blew the cobwebs out of our heads on the subject of profiling.

免责声明 - 论文未能引用我的文章:Dunlavey,从调用堆栈采样得到的指令级成本的性能调优, ACM SIGPLAN Notices 42,8(2007年8月),第4-8页。

Disclaimer - the paper failed to reference my article: Dunlavey, "Performance tuning with instruction-level cost derived from call-stack sampling", ACM SIGPLAN Notices 42, 8 (August, 2007), pp. 4-8.

这篇关于如果探查器不是答案,我们还有其他选择吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆