如何分析程序运行时间 [英] How to analyze program running time

查看:158
本文介绍了如何分析程序运行时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图优化c ++程序的性能并减少其运行时间。但是,我无法弄清楚瓶颈在哪里。

I am trying to optimize a c++ program's performance and reduce its run time. However, I am having trouble figuring out where is the bottleneck.

time命令显示程序本身运行大约5分钟,大约5分钟,用户cpu时间需要4.5分钟。

time command shows that the program itself takes about 5 minutes to run, and about the 5 minutes, user cpu time takes 4.5 minutes.

CPU分析器(gcc profiler和google perftool)显示函数调用在CPU时间总共只需要60秒。我也试图使用分析器来采样实时而不是cpu时间,它给了我类似的结果。

CPU profiler (both gcc profiler and google perftool) shows that the function calls only take 60 seconds in total in CPU time. I also tried to use the profiler to sample real time instead of cpu time, and it gives me similar results.

I / O剖析器(我使用ioapps)也显示I / O只需要大约30秒的程序运行时间。

I/O profiler (I used ioapps) also shows that I/O only takes about 30 seconds of the program running time.

所以基本上我有3.5分钟(最大的程序运行时间)未说明,我相信是瓶颈在哪里。

So basically I have 3.5 minutes (the largest bulk of the program running time) unaccounted for, and I believe that is where the bottleneck is.

我错过了什么,我怎么知道这个时间在哪里?

What did I miss and how do I get to know where that time goes?

推荐答案

正如ÖöTiib建议的,只需在调试器中断程序。我这样做是让程序运行,切换到输出窗口,键入Ctrl-C中断程序,切换回GDB窗口,键入线程1,以便在主程序的上下文中,并输入bt查看堆栈跟踪。

As Öö Tiib suggested, just break the program in a debugger. The way I do it is get the program running, switch to the output window, type Ctrl-C to interrupt the program, switch back to the GDB window, type "thread 1" so as to be in the context of the main program, and type "bt" to see the stack trace.

现在,看看堆栈跟踪并理解它,因为虽然程序计数器上的指令负责该特定

Now, look at the stack trace and understand it, because while the instruction at the program counter is responsible for that particular cycle being spent, so is every call on the stack.

如果你这样做了几次,你就会看到什么线路负责瓶颈。
一旦你在两(2)个样本上看到它,你就把它钉了。
然后修复它,并再次执行,找到下一个瓶颈,等等。
你可以很容易地发现你以这种方式获得了巨大的加速。

If you do this a few times, you're going to see exactly what line is responsible for the bottleneck. As soon as you see it on two (2) samples, you've nailed it. Then fix it and do it all again, finding the next bottleneck, and so on. You could easily find that you get enormous speedup this way.

< flame>

< flame>

有些人说这是profilers做的,只有他们做得更好。
这是你在演讲厅和博客中听到的,但是这里是交易:
有些方法可以加快你的代码,不会显示为慢功能或热门路径,例如 - 重组数据结构。
每个函数看起来或多或少是无辜的,即使它具有高包容性的时间百分比。

Some people say this is exactly what profilers do, only they do it better. That's what you hear in lecture halls and on blogs, but here's the deal: There are ways to speed up your code that do not reveal themselves as "slow functions" or "hot paths", for example - reorganizing the data structure. Every function looks more-or-less innocent, even if it has high inclusive time percent.

如果你实际看堆叠样本。
因此,良好的profiler的问题不在于样本集合,它在结果的表示。统计和测量不能告诉你一小部分样品,仔细检查,告诉你。

They do reveal themselves if you actually look at stack samples. So the problem with good profilers is not in the collection of samples, it is in the presentation of results. Statistics and measurements cannot tell you what a small selection of samples, examined carefully, do tell you.

小样本和大样本的问题怎么样?不是更好吗?
确定,假设你有一个无限循环,或者如果不是无限的,它只是运行远远比你知道的应该的更长? 1000堆栈样本发现它比任何一个单一的样本? (不是)如果你在调试器下看,你知道你在循环,因为它基本上100%的时间。它在堆栈的某个地方 - 只是扫描堆栈,直到你找到它。
即使循环只需要50%或20%的时间,这是每个样本将看到它的概率。
所以,如果你看到的东西,你可以摆脱只有两个样本,这是值得的。
那么,这1000个样本给你买了什么?

What about the issue of small vs. large number of samples? Aren't more better? OK, suppose you have an infinite loop, or if not infinite, it just runs far longer than you know it should? Would 1000 stack samples find it any better than a single sample? (No.) If you look at it under a debugger, you know you're in the loop because it takes basically 100% of the time. It's on the stack somewhere - just scan up the stack until you find it. Even if the loop only takes 50% or 20% of the time, that's the probability each sample will see it. So, if you see something you could get rid of on as few as two samples, it's worth doing it. So, what do the 1000 samples buy you?

也许有人认为:那么,如果我们错过一两个问题怎么办?好吧,是吗?
假设代码有三个问题P占用50%的时间,Q占25%,R占12.5%。好的东西叫做A.
如果你修复其中一个,其中两个,或者所有三个,你会得到加速。

Maybe one thinks: "So what if we miss a problem or two? Maybe it's good enough." Well, is it? Suppose the code has three problems P taking 50% of the time, Q taking 25%, and R taking 12.5%. The good stuff is called A. This shows the speedup you get if you fix one of them, two of them, or all three of them.

PRPQPQPAPQPAPRPQ original time with avoidable code P, Q, and R all mixed together
RQQAQARQ         fix P           - 2 x   speedup
PRPPPAPPAPRP     fix Q           - 1.3 x    "
PPQPQPAPQPAPPQ   fix R           - 1.14 x   "
RAAR             fix P and Q     - 4 x      "
QQAQAQ           fix P and R     - 2.7 x    "
PPPPAPPAPP       fix Q and R     - 1.6 x    "
AA               fix P, Q, and R - 8 x   speedup



如果你错过了两次,你可以这样做。

Does this make it clear why the ones that "get away" really hurt? The best you can do if you miss any is twice as slow.

找到如果你检查样品P是在一半的样品
如果你修复P并再做,Q是在一半的样品。一旦你修复Q,R是一半的样品
Fix R你有你的8倍速加速
你不必停在那里。

They are easy to find if you examine samples. P is on half the samples. If you fix P and do it again, Q is on half the samples. Once you fix Q, R is on half the samples. Fix R and you've got your 8x speedup. You don't have to stop there. You can keep going until you truly can't find anything to fix.

越多的问题,潜在的加速越快,
,但你可以'不能错过任何。
profilers(甚至好的)的问题是,通过否认你有机会看到和学习​​单个样本,他们隐藏你需要找到的问题。
更多信息。
对于统计倾向,下面是它的工作原理。

The more problems there are, the higher the potential speedup, but you can't afford to miss any. The problem with profilers (even good ones) is that, by denying you the chance to see and study individual samples, they hide problems that you need to find. More on all that. For the statistically inclined, here's how it works.

有好的profiler。
最好的是wall-time堆栈采样器,报告单个行的包含百分比,让您用热键打开和关闭采样。
缩放 是这样的剖析器。

There are good profilers. The best are wall-time stack samplers that report inclusive percent at individual lines, letting you turn sampling on and off with a hot-key. Zoom is such a profiler.

但是即使是那些假设你需要大量样品的错误。
你不能,你为他们支付的价格是你实际上看不到任何,所以你不能看到为什么花费的时间,所以你不能很容易判断是否有必要,
,你不能摆脱的东西,除非你知道你不需要它。
结果是你错过了瓶颈,他们最终阻碍你的加速。

But even those make the mistake of assuming you need lots of samples. You don't, and the price you pay for them is you can't actually see any, so you can't see why the time is being spent, so you can't easily tell if it's necessary, and you can't get rid of something unless you know you don't need it. The result is you miss bottlenecks, and they end up stunting your speedup.

< / flame>

< /flame>

这篇关于如何分析程序运行时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆