如果分析器不是答案,我们还有什么其他选择? [英] If profiler is not the answer, what other choices do we have?

查看:27
本文介绍了如果分析器不是答案,我们还有什么其他选择?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

看完Joshua Bloch的演讲Performance Anxiety"后,我阅读了他在演讲中推荐的论文评估 Java 分析器的准确性".引用结论:

After watching the presentation "Performance Anxiety" of Joshua Bloch, I read the paper he suggested in the presentation "Evaluating the Accuracy of Java Profilers". Quoting the conclusion:

我们的结果令人不安,因为它们表明分析器不正确是普遍存在的——在我们的七个基准测试中的大多数和两个生产 JVM 中都发生——而且很重要——所有四个最先进的配置文件生成错误的配置文件.不正确配置文件很容易导致性能分析师花时间优化对性能影响最小的冷​​方法.我们展示了一个不使用 yield 的概念验证分析器采样点不存在上述问题

Our results are disturbing because they indicate that profiler incorrectness is pervasive—occurring in most of our seven benchmarks and in two production JVM—-and significant—all four of the state-of-the-art profilers produce incorrect profiles. Incorrect profiles can easily cause a performance analyst to spend time optimizing cold methods that will have minimal effect on performance. We show that a proof-of-concept profiler that does not use yield points for sampling does not suffer from the above problems

论文的结论是我们不能真正相信分析器的结果.但是,使用探查器的替代方法是什么.是不是应该回去用自己的感觉做优化?

The conclusion of the paper is that we cannot really believe the result of profilers. But then, what is the alternative of using profilers. Should we go back and just use our feeling to do optimization?

更新:讨论中似乎忽略的一点是观察者效应.我们能否构建一个真正不受观察者效应"影响的分析器?

UPDATE: A point that seems to be missed in the discussion is observer effect. Can we build a profiler that really 'observer effect'-free?

推荐答案

哦,伙计,从哪里开始?

Oh, man, where to begin?

首先,我很惊讶这是新闻.其次,问题不在于分析器不好,而是某些 分析器不好.作者建立了一个他们认为很好的方法,只是通过避免他们在评估的错误中发现的一些错误.由于一些持续存在的关于性能分析的迷思,错误很常见.

First, I'm amazed that this is news. Second, the problem is not that profilers are bad, it is that some profilers are bad. The authors built one that, they feel, is good, just by avoiding some of the mistakes they found in the ones they evaluated. Mistakes are common because of some persistent myths about performance profiling.

但让我们积极一点.如果想找机会加速,其实很简单:

But let's be positive. If one wants to find opportunities for speedup, it is really very simple:

  • 采样应该与程序的状态不相关.
    这意味着发生在真正随机的时间,无论程序是处于 I/O(用户输入除外)、GC 中还是 CPU 紧循环中,或者其他任何情况.

  • Sampling should be uncorrelated with the state of the program.
    That means happening at a truly random time, regardless of whether the program is in I/O (except for user input), or in GC, or in a tight CPU loop, or whatever.

采样应该读取函数调用栈,
以确定在样本发生时哪些陈述是活跃的".原因是每个调用点(函数被调用的点)的百分比成本等于它在堆栈上的时间部分.(注意:本文完全关注自时间,忽略了大型软件中可避免的函数调用的巨大影响.实际上,原始 gprof 背后的原因是为了帮助找到这些调用.)

Sampling should read the function call stack,
so as to determine which statements were "active" at the time of the sample. The reason is that every call site (point at which a function is called) has a percentage cost equal to the fraction of time it is on the stack. (Note: the paper is concerned entirely with self-time, ignoring the massive impact of avoidable function calls in large software. In fact, the reason behind the original gprof was to help find those calls.)

报告应按行显示百分比(而不是按功能).
如果确定了热"函数,则仍然必须在其中寻找占时间的热"代码行.该信息在示例中!为什么要隐藏它?

Reporting should show percent by line (not by function).
If a "hot" function is identified, one still has to hunt inside it for the "hot" lines of code accounting for the time. That information is in the samples! Why hide it?

一个几乎普遍存在的错误(该论文共享)是过于关注测量的准确性,而对位置的准确性关注不够.例如,这是一个性能调整示例其中识别并修复了一系列性能问题,从而使复合加速提高了 43 倍.在解决问题之前不必精确地知道每个问题的大小,而是要知道它的位置.性能调优的一个现象是解决一个问题,通过减少时间,放大剩余问题的百分比,因此更容易找到它们.只要发现并解决任何问题,就会朝着发现和解决所有问题的目标前进.不一定要按尺寸递减的顺序修复它们,但必须确定它们的位置.

An almost universal mistake (that the paper shares) is to be concerned too much with accuracy of measurement, and not enough with accuracy of location. For example, here is an example of performance tuning in which a series of performance problems were identified and fixed, resulting in a compounded speedup of 43 times. It was not essential to know precisely the size of each problem before fixing it, but to know its location. A phenomenon of performance tuning is that fixing one problem, by reducing the time, magnifies the percentages of remaining problems, so they are easier to find. As long as any problem is found and fixed, progress is made toward the goal of finding and fixing all the problems. It is not essential to fix them in decreasing size order, but it is essential to pinpoint them.

关于测量的统计精度问题,如果调用点在堆栈上的某个时间百分比 F(如 20%)和 N(如 100)个随机时间样本被获取,那么样本数显示调用点是二项式分布,均值 = NF = 20,标准差 = sqrt(NF(1-F)) = sqrt(16) = 4.所以显示它的样本百分比为 20% +/- 4%.那么准确吗?不是真的,但问题找到了吗?准确地说.

On the subject of statistical accuracy of measurement, if a call point is on the stack some percent of time F (like 20%), and N (like 100) random-time samples are taken, then the number of samples that show the call point is a binomial distribution, with mean = NF = 20, standard deviation = sqrt(NF(1-F)) = sqrt(16) = 4. So the percent of samples that show it will be 20% +/- 4%. So is that accurate? Not really, but has the problem been found? Precisely.

事实上,问题越大,就百分比而言,定位它所需的样本就越少.例如,如果采集了 3 个样本,并且其中 2 个出现了一个呼叫点,则很可能成本非常高.(具体来说,它遵循 beta 分布.如果您生成 4 个均匀的 0,1 随机数,并对它们进行排序,则第 3 个的分布就是该调用点的成本分布.它的平均值是 (2+1)/(3+2) = 0.6,所以这是给定这些样本的预期节省.)插入:您获得的加速因子由另一个发行版控制,BetaPrime,以及它的平均值是 4.所以如果你取 3 个样本,发现其中 2 个的问题,然后消除这个问题,平均你会让程序快四倍.

In fact, the larger a problem is, in terms of percent, the fewer samples are needed to locate it. For example, if 3 samples are taken, and a call point shows up on 2 of them, it is highly likely to be very costly. (Specifically, it follows a beta distribution. If you generate 4 uniform 0,1 random numbers, and sort them, the distribution of the 3rd one is the distribution of cost for that call point. It's mean is (2+1)/(3+2) = 0.6, so that is the expected savings, given those samples.) INSERTED: And the speedup factor you get is governed by another distribution, BetaPrime, and its average is 4. So if you take 3 samples, see a problem on 2 of them, and eliminate that problem, on average you will make the program four times faster.

现在是我们程序员在分析主题上吹嘘自己的时候了.

It's high time we programmers blew the cobwebs out of our heads on the subject of profiling.

免责声明 - 该论文没有引用我的文章:Dunlavey,使用调用堆栈采样衍生的指令级成本进行性能调整",ACM SIGPLAN Notices 42, 8(2007 年 8 月),第 4-8 页.

Disclaimer - the paper failed to reference my article: Dunlavey, "Performance tuning with instruction-level cost derived from call-stack sampling", ACM SIGPLAN Notices 42, 8 (August, 2007), pp. 4-8.

这篇关于如果分析器不是答案,我们还有什么其他选择?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆