在Linux中我可以使用什么来配置C ++代码? [英] What can I use to profile C++ code in Linux?

查看:120
本文介绍了在Linux中我可以使用什么来配置C ++代码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个C ++应用程序我正在优化。我可以使用什么方法或工具来确定我的代码运行缓慢?

解决方案

如果你的目标是使用分析器



但是,如果你很匆忙,你可以手动中断你的程序在调试器,而主观上慢,有一个简单的方法来找到性能问题。



只要停止它几次,每次看调用堆栈。如果有一些代码浪费了一些百分比的时间,20%或50%或任何,那就是在每个样本的行为中捕获它的概率。这大约是你将看到它的样品的百分比。没有必要的教育猜测。
如果你有一个猜测的问题是什么,这将证明或反驳它。



您可能有不同大小的多个性能问题。如果你清除了其中的任何一个,剩余的将占用更大的百分比,并更容易发现,在随后的通行证。
这种放大效果在复合多个问题时,可能会导致真正大的加速因素。



注意:程序员往往怀疑这种技术,除非他们自己使用它。他们会说,profiler会给你这个信息,但这只是真的,如果他们抽样整个调用堆栈,然后让你检查随机的样本集。调用图不会给你相同的信息,因为


  1. 他们不总结

他们还会说,它只适用于玩具程序,实际上它适用于任何程序,它似乎更好地在更大的程序,因为他们往往有更多的问题找到。
他们会说它有时会发现没有问题的东西,但这只有在你看到一次时才是真的。如果您在多个示例中看到问题,那就是真实的。



如果有一种方法可以在一个时间点收集线程池的调用堆栈样本,这也可以在多线程程序上完成,如在Java中。



PPS作为一个粗略的通用性,您在软件中拥有的抽象层越多,您就越有可能发现性能问题的原因(以及获得加速的机会)。



添加:这可能不是很明显,但是堆栈抽样技术在递归的情况下同样有效。原因是,通过移除指令将节省的时间由包含它的样本的分数近似,而不管样本内可能发生的次数。



我经常听到的另一个异议是:它会停止某个地方随机,它会错过真正的问题
这来自于具有真实问题的先验概念。
性能问题的一个关键属性是他们反对期望。
抽样告诉你某事是一个问题,你的第一反应是难以置信。
这是自然的,但你可以肯定如果它发现一个问题是真实的,反之亦然。



添加:让我做一个贝叶斯解释的工作原理。假设有一些指令 I (调用或其他)在调用堆栈的某些分数 f 的时间因此成本很高)。为简单起见,假设我们不知道 f 是什么,但假设它是0.1,0.2,0.3,... 0.9,1.0,每个的先验概率



然后假设我们只需要两个堆栈样本,我们看到指令 I ,指定观察 o = 2/2 。这给了我们对 I 的频率 f 的新估计,根据:

 先前
P(f = x)x P(o = 2/2 | f = x)P(o = 2/2& f = x)P(o = 2/2& f = x)P(f> = x)

0.1 1 1 0.1 0.1 0.25974026
0.1 0.9 0.81 0.081 0.181 0.47012987
0.1 0.8 0.64 0.064 0.245 0.636363636
0.1 0.7 0.49 0.049 0.294 0.763636364
0.1 0.6 0.36 0.036 0.33 0.857142857
0.1 0.5 0.25 0.025 0.355 0.922077922
0.1 0.4 0.16 0.016 0.371 0.963636364
0.1 0.3 0.09 0.009 0.38 0.987012987
0.1 0.2 0.04 0.004 0.384 0.997402597
0.1 0.1 0.01 0.001 0.385 1

P(o = 2/2)0.385

最后一列说明,例如 f > = 0.5是92%,从先前的假设60%。



假设先前的假设是不同的。假设我们假设P(f = 0.1)是.991(几乎确定),所有其他可能性几乎是不可能的(0.001)。换句话说,我们先前的确定性是 I 是便宜的。然后我们得到:

 先前
P(f = x)x P(o = 2/2 | f = x)P(o = 2/2& f = x)P(o = 2/2& f = x)P(f> = x)

0.001 1 1 0.001 0.001 0.072727273
0.001 0.9 0.81 0.00081 0.00181 0.131636364
0.001 0.8 0.64 0.00064 0.00245 0.178181818
0.001 0.7 0.49 0.00049 0.00294 0.213818182
0.001 0.6 0.36 0.00036 0.0033 0.24
0.001 0.5 0.25 0.00025 0.00355 0.258181818
0.001 0.4 0.16 0.00016 0.00371 0.269818182
0.001 0.3 0.09 0.00009 0.0038 0.276363636
0.001 0.2 0.04 0.00004 0.00384 0.279272727
0.991 0.1 0.01 0.00991 0.01375 1

P(o = 2/2)0.01375

是26%,高于之前的0.6%的假设。因此,贝叶斯允许我们更新我们对 I 的可能成本的估计。如果数据量很小,它不能准确告诉我们什么是成本,只是它足够大,值得修复。



另一种方法看看它被称为



测量水平;它告诉你特定例程需要多少时间。
采样是垂直的。
如果有任何方式避免整个程序在这一刻做什么,如果你在第二个示例看到,你发现瓶颈。
这就是造成差别的原因 - 看到时间花费的全部原因,而不是多少。


I have a C++ application I'm in the process of optimizing. What method or tool can I use to pinpoint where exactly my code is running slowly?

解决方案

If your goal is to use a profiler, use one of the suggested ones.

However, if you're in a hurry and you can manually interrupt your program under the debugger while it's being subjectively slow, there's a simple way to find performance problems.

Just halt it several times, and each time look at the call stack. If there is some code that is wasting some percentage of the time, 20% or 50% or whatever, that is the probability that you will catch it in the act on each sample. So that is roughly the percentage of samples on which you will see it. There is no educated guesswork required. If you do have a guess as to what the problem is, this will prove or disprove it.

You may have multiple performance problems of different sizes. If you clean out any one of them, the remaining ones will take a larger percentage, and be easier to spot, on subsequent passes. This magnification effect, when compounded over multiple problems, can lead to truly massive speedup factors.

Caveat: Programmers tend to be skeptical of this technique unless they've used it themselves. They will say that profilers give you this information, but that is only true if they sample the entire call stack, and then let you examine a random set of samples. (The summaries are where the insight is lost.) Call graphs don't give you the same information, because

  1. they don't summarize at the instruction level, and
  2. they give confusing summaries in the presence of recursion.

They will also say it only works on toy programs, when actually it works on any program, and it seems to work better on bigger programs, because they tend to have more problems to find. They will say it sometimes finds things that aren't problems, but that is only true if you see something once. If you see a problem on more than one sample, it is real.

P.S. This can also be done on multi-thread programs if there is a way to collect call-stack samples of the thread pool at a point in time, as there is in Java.

P.P.S As a rough generality, the more layers of abstraction you have in your software, the more likely you are to find that that is the cause of performance problems (and the opportunity to get speedup).

Added: It might not be obvious, but the stack sampling technique works equally well in the presence of recursion. The reason is that the time that would be saved by removal of an instruction is approximated by the fraction of samples containing it, regardless of the number of times it may occur within a sample.

Another objection I often hear is: "It will stop someplace random, and it will miss the real problem". This comes from having a prior concept of what the real problem is. A key property of performance problems is that they defy expectations. Sampling tells you something is a problem, and your first reaction is disbelief. That is natural, but you can be sure if it finds a problem it is real, and vice-versa.

ADDED: Let me make a Bayesian explanation of how it works. Suppose there is some instruction I (call or otherwise) which is on the call stack some fraction f of the time (and thus costs that much). For simplicity, suppose we don't know what f is, but assume it is either 0.1, 0.2, 0.3, ... 0.9, 1.0, and the prior probability of each of these possibilities is 0.1, so all of these costs are equally likely a-priori.

Then suppose we take just 2 stack samples, and we see instruction I on both samples, designated observation o=2/2. This gives us new estimates of the frequency f of I, according to this:

Prior                                    
P(f=x) x  P(o=2/2|f=x) P(o=2/2&&f=x)  P(o=2/2&&f >= x)  P(f >= x)

0.1    1     1             0.1          0.1            0.25974026
0.1    0.9   0.81          0.081        0.181          0.47012987
0.1    0.8   0.64          0.064        0.245          0.636363636
0.1    0.7   0.49          0.049        0.294          0.763636364
0.1    0.6   0.36          0.036        0.33           0.857142857
0.1    0.5   0.25          0.025        0.355          0.922077922
0.1    0.4   0.16          0.016        0.371          0.963636364
0.1    0.3   0.09          0.009        0.38           0.987012987
0.1    0.2   0.04          0.004        0.384          0.997402597
0.1    0.1   0.01          0.001        0.385          1

                  P(o=2/2) 0.385                

The last column says that, for example, the probability that f >= 0.5 is 92%, up from the prior assumption of 60%.

Suppose the prior assumptions are different. Suppose we assume P(f=0.1) is .991 (nearly certain), and all the other possibilities are almost impossible (0.001). In other words, our prior certainty is that I is cheap. Then we get:

Prior                                    
P(f=x) x  P(o=2/2|f=x) P(o=2/2&& f=x)  P(o=2/2&&f >= x)  P(f >= x)

0.001  1    1              0.001        0.001          0.072727273
0.001  0.9  0.81           0.00081      0.00181        0.131636364
0.001  0.8  0.64           0.00064      0.00245        0.178181818
0.001  0.7  0.49           0.00049      0.00294        0.213818182
0.001  0.6  0.36           0.00036      0.0033         0.24
0.001  0.5  0.25           0.00025      0.00355        0.258181818
0.001  0.4  0.16           0.00016      0.00371        0.269818182
0.001  0.3  0.09           0.00009      0.0038         0.276363636
0.001  0.2  0.04           0.00004      0.00384        0.279272727
0.991  0.1  0.01           0.00991      0.01375        1

                  P(o=2/2) 0.01375                

Now it says P(f >= 0.5) is 26%, up from the prior assumption of 0.6%. So Bayes allows us to update our estimate of the probable cost of I. If the amount of data is small, it doesn't tell us accurately what the cost is, only that it is big enough to be worth fixing.

Yet another way to look at it is called the Rule Of Succession. If you flip a coin 2 times, and it comes up heads both times, what does that tell you about the probable weighting of the coin? The respected way to answer is to say that it's a Beta distribution, with average value (number of hits + 1) / (number of tries + 2) = (2+1)/(2+2) = 75%.

(The key is that we see I more than once. If we only see it once, that doesn't tell us much except that f > 0.)

So, even a very small number of samples can tell us a lot about the cost of instructions that it sees. (And it will see them with a frequency, on average, proportional to their cost. If n samples are taken, and f is the cost, then I will appear on nf+/-sqrt(nf(1-f)) samples. Example, n=10, f=0.3, that is 3+/-1.4 samples.)


ADDED, to give an intuitive feel for the difference between measuring and random stack sampling:
There are profilers now that sample the stack, even on wall-clock time, but what comes out is measurements (or hot path, or hot spot, from which a "bottleneck" can easily hide). What they don't show you (and they easily could) is the actual samples themselves. And if your goal is to find the bottleneck, the number of them you need to see is, on average, 2 divided by the fraction of time it takes. So if it takes 30% of time, 2/.3 = 6.7 samples, on average, will show it, and the chance that 20 samples will show it is 99.2%.

Here is an off-the-cuff illustration of the difference between examining measurements and examining stack samples. The bottleneck could be one big blob like this, or numerous small ones, it makes no difference.

Measurement is horizontal; it tells you what fraction of time specific routines take. Sampling is vertical. If there is any way to avoid what the whole program is doing at that moment, and if you see it on a second sample, you've found the bottleneck. That's what makes the difference - seeing the whole reason for the time being spent, not just how much.

这篇关于在Linux中我可以使用什么来配置C ++代码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆