超越堆栈采样:C ++ Profilers [英] Beyond Stack Sampling: C++ Profilers

查看:181
本文介绍了超越堆栈采样:C ++ Profilers的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

黑客的故事



日期为12/02/10。圣诞节前的几天里,我几乎忘记了一个Windows程序员的主要路障。我一直在使用AQTime,我试过困,闪亮,非常困,和我们说,VTune正在安装。我试图使用VS2008分析器,它一直积极惩罚,以及经常不明智。我使用了随机暂停技术。我研究了call-trees。我已经解除了函数跟踪。但令人痛心的事实是,我使用的应用程序是超过一百万行代码,可能还有另一百万行的第三方应用程序。



我需要更好的工具。我已阅读其他主题。我已经尝试过每个主题中列出的每个分析器。只需要有比这些垃圾和昂贵的选择更好的东西,或者几乎没有收获的可笑的工作量。为了进一步复杂的事情,我们的代码是严重的线程,并运行一些Qt事件循环,其中一些是如此脆弱,它们在严重的仪器由于时序延迟崩溃。不要问我为什么要运行多个事件循环。没有人可以告诉我。



在Windows环境中,Valgrind有更多选择吗?

有什么比长我已经尝试过的破损的工具条?

有什么设计与Qt集成,可能有一个有用的显示队列中的事件?



我尝试的工具的完整列表,以及非常有用的斜体:




  • AQTime:相当好!在深度递归中有一些麻烦,但是在这些情况下调用图是正确的,可以用来清除你可能有的混乱。不是一个完美的工具,但值得尝试。

  • 在调试模式下随机暂停攻击:没有足够的信息可以满足您的需要

    一个很好的工具,但不是一个完整的解决方案。

  • 平行工作室:核选项。突兀,奇怪,和疯狂的强大。我认为你应该打上30天的评价,并弄清楚如果它是一个很好的适合。

  • AMD Codeanalyst:精彩,易于使用,非常容易崩溃,但我认为这是一个环境问题。我建议尝试一下,因为它是免费的。

  • Luke Stackwalker:在小项目上工作得很好,它有点想让它在我们的工作。

  • PurifyPlus:不支持Win-x64环境,最突出的是Windows 7.否则非常好。

  • VS2008 Profiler:以所需的分辨率在功能跟踪模式下在100 + gigs范围内产生输出。

  • GProf:需要GCC中等效果。

  • VTune:VTune的W7支持边界。否则优秀

  • PIN:我需要修改自己的工具,因此这是最后的办法。

  • Sleepy\VerySleepy
  • EasyBuilder:不错,如果你不介意一些手动注入的代码来指明仪器的位置。
  • Valgrind:* nix only,but very good when you in the environment。

  • OProfile:仅限Linux。

  • :他们拍野马。



我没有尝试的建议工具:




  • XPerf:

  • Glowcode:

  • Devpartner:



注意:
目前英特尔的环境。 VS2008,boost库。 Qt 4+。和他们所有的可怜的嘻哈:Qt / MFC集成通过trolltech。



现在:近两个星期后,看起来我的问题已解决。由于各种工具,包括列表上的几乎所有内容和一些我的个人技巧,我们发现主要的瓶颈。但是,我将继续测试,探索和尝试新的profiler以及新技术。为什么?因为我欠你们,因为你们摇滚。



剧情介绍

在许多其他问题中,一些组件最近被切换到不正确的线程模型,导致严重的挂起,因为我们下面的代码突然不再是多线程的。我不能多说,因为它违反了我的NDA,但我可以告诉你,这永远不会被偶然检查或正常的代码审查找到。没有profilers,callgraphs和随机停顿结合,我们仍然在尖叫着我们的愤怒在美丽的蓝色天空的弧。幸运的是,我与一些最好的黑客,我遇到过,我可以访问一个惊人的诗歌充满了伟大的工具和伟大的人。



Gentlefolk,我非常感激,非常遗憾,我没有足够的代表奖励每个人的赏金。我仍然认为这是一个重要的问题,得到一个更好的答案比我们迄今为止的SO。



因此,在接下来的三个星期的每一周,我将提供我能负担得起的最大奖金,并用最好的工具我认为不是常识。



外卖



使用分析器。他们对Ritchie,Kernighan,Bentley和Knuth都够好。我不在乎你认为你是谁。使用分析器。如果你有一个不工作,找到另一个。如果你找不到一个,代码一。如果你不能代码一个,或者它是一个小挂起,或者你只是卡住,使用随机暂停。



>
所以,我认为写一个回顾展可能会很好。我选择与Parallel Studios广泛合作,部分原因是它实际上建立在PIN工具之上。在与一些研究人员进行过学术交流后,我觉得这可能是一些质量的标志。谢天谢地,我是对的。虽然GUI有点可怕,我发现IPS是非常有用的,虽然我不能舒适地推荐给大家。关键的是,没有明显的方法来获得行级命中计数,AQT和一些其他profiler提供的东西,我发现非常有用的检查分支选择率等。在网上,我很喜欢使用AQTime,我发现他们的支持是真正的反应。再次,我必须符合我的建议:他们的很多功能不能很好地工作,其中一些是彻底的崩溃倾向于Win7x64。 XPerf也表现良好,但是对于在某些类型的应用程序上获得良好读取所需的采样细节而言,速度非常缓慢。



现在,我不得不说,我认为在W7x64环境中没有一个确定的选项来分析C ++代码,但肯定有选项无法执行任何有用的服务。

解决方案

第一:



时间抽样剖析器比CPU抽样剖析器更强大。我不是非常熟悉Windows开发工具,所以我不能说哪些是哪些。大多数剖析器都是CPU采样。



一个CPU采样剖析器每N个指令抓取一个堆栈跟踪。

此技术将显示您的代码部分CPU绑定。这是真棒,如果这是瓶颈在你的应用程序。



时间采样分析器每N微秒捕获一次堆栈跟踪。

此技术将在slow代码中置零。原因是CPU限制,阻塞IO绑定,互斥绑定,还是高速缓存抖动部分的代码。



因此,如果可能,特别是在剖析线程代码时,使用时间抽样分析器。



第二个:



采样分析器生成数据块。数据是非常有用的,但是通常太多了,很容易有用。配置文件数据可视化工具在这里非常有用。我发现用于配置文件数据可视化的最佳工具是 gprof2dot 。不要让这个名字欺骗你,它处理各种采样分析器输出(AQtime,Sleepy,XPerf等)。一旦可视化指出了违规功能,就跳回到原始配置文件数据,以获得关于真正原因的更好的提示。



gprof2dot工具生成点图描述,然后将其转换为 graphviz 工具。输出基本上是一个函数的颜色编码的调用图,它们对应用程序的影响。



几个提示以获得gprof2dot生成不错的输出。




  • 我使用 - skew 0.001在我的图,所以我可以很容易地看到热代码路径。否则 int main()支配图。

  • 如果你对C ++模板做任何疯狂的事情,添加 - strip 。这对于Boost尤其如此。

  • 我使用OProfile生成我的采样数据。为了获得良好的输出,我需要配置它来加载我的第三方和系统库的调试符号。一定要做同样的,否则你会看到CRT正在占用你的应用程序的时间的20%,当真正发生的是 malloc 是捣烂堆和吃了15 %。


A Hacker's Tale

The date is 12/02/10. The days before Christmas are dripping away and I've pretty much hit a major road block as a windows programmer. I've been using AQTime, I've tried sleepy, shiny, and very sleepy, and as we speak, VTune is installing. I've tried to use the VS2008 profiler, and it's been positively punishing as well as often insensible. I've used the random pause technique. I've examined call-trees. I've fired off function traces. But the sad painful fact of the matter is that the app I'm working with is over a million lines of code, with probably another million lines worth of third-party apps.

I need better tools. I've read the other topics. I've tried out each profiler listed in each topic. There simply has to be something better than these junky and expensive options, or ludicrous amounts of work for almost no gain. To further complicate matters, our code is heavily threaded, and runs a number of Qt Event loops, some of which are so fragile that they crash under heavy instrumentation due to timing delays. Don't ask me why we're running multiple event loops. No one can tell me.

Are there any options more along the lines of Valgrind in a windows environment?
Is there anything better than the long swath of broken tools I've already tried?
Is there anything designed to integrate with Qt, perhaps with a useful display of events in queue?

A full list of the tools I tried, with the ones that were really useful in italics:

  • AQTime: Rather good! Has some trouble with deep recursion, but the call graph is correct in these cases, and can be used to clear up any confusion you might have. Not a perfect tool, but worth trying out. It might suit your needs, and it certainly was good enough for me most of the time.
  • Random Pause attack in debug mode: Not enough information enough of the time.
    A good tool but not a complete solution.
  • Parallel Studios: The nuclear option. Obtrusive, weird, and crazily powerful. I think you should hit up the 30 day evaluation, and figure out if it's a good fit. It's just darn cool, too.
  • AMD Codeanalyst: Wonderful, easy to use, very crash-prone, but I think that's an environment thing. I'd recommend trying it, as it is free.
  • Luke Stackwalker: Works fine on small projects, it's a bit trying to get it working on ours. Some good results though, and it definitely replaces Sleepy for my personal tasks.
  • PurifyPlus: No support for Win-x64 environments, most prominently Windows 7. Otherwise excellent. A number of my colleagues in other departments swear by it.
  • VS2008 Profiler: Produces output in the 100+gigs range in function trace mode at the required resolution. On the plus side, produces solid results.
  • GProf: Requires GCC to be even moderately effective.
  • VTune: VTune's W7 support borders on criminal. Otherwise excellent
  • PIN: I'd need to hack up my own tool, so this is sort of a last resort.
  • Sleepy\VerySleepy: Useful for smaller apps, but failing me here.
  • EasyProfiler: Not bad if you don't mind a bit of manually injected code to indicate where to instrument.
  • Valgrind: *nix only, but very good when you're in that environment.
  • OProfile: Linux only.
  • Proffy: They shoot wild horses.

Suggested tools that I haven't tried:

  • XPerf:
  • Glowcode:
  • Devpartner:

Notes: Intel environment at the moment. VS2008, boost libraries. Qt 4+. And the wretched humdinger of them all: Qt/MFC integration via trolltech.


Now: Almost two weeks later, it looks like my issue is resolved. Thanks to a variety of tools, including almost everything on the list and a couple of my personal tricks, we found the primary bottlenecks. However, I'm going to keep testing, exploring, and trying out new profilers as well as new tech. Why? Because I owe it to you guys, because you guys rock. It does slow the timeline down a little, but I'm still very excited to keep trying out new tools.

Synopsis
Among many other problems, a number of components had recently been switched to the incorrect threading model, causing serious hang-ups due to the fact that the code underneath us was suddenly no longer multithreaded. I can't say more because it violates my NDA, but I can tell you that this would never have been found by casual inspection or even by normal code review. Without profilers, callgraphs, and random pausing in conjunction, we'd still be screaming our fury at the beautiful blue arc of the sky. Thankfully, I work with some of the best hackers I've ever met, and I have access to an amazing 'verse full of great tools and great people.

Gentlefolk, I appreciate this tremendously, and only regret that I don't have enough rep to reward each of you with a bounty. I still think this is an important question to get a better answer to than the ones we've got so far on SO.

As a result, each week for the next three weeks, I'll be putting up the biggest bounty I can afford, and awarding it to the answer with the nicest tool that I think isn't common knowledge. After three weeks, we'll hopefully have accumulated a definitive profile of the profilers, if you'll pardon my punning.

Take-away
Use a profiler. They're good enough for Ritchie, Kernighan, Bentley, and Knuth. I don't care who you think you are. Use a profiler. If the one you've got doesn't work, find another. If you can't find one, code one. If you can't code one, or it's a small hang up, or you're just stuck, use random pausing. If all else fails, hire some grad students to bang out a profiler.


A Longer View
So, I thought it might be nice to write up a bit of a retrospective. I opted to work extensively with Parallel Studios, in part because it is actually built on top of the PIN Tool. Having had academic dealings with some of the researchers involved, I felt that this was probably a mark of some quality. Thankfully, I was right. While the GUI is a bit dreadful, I found IPS to be incredibly useful, though I can't comfortably recommend it for everyone. Critically, there's no obvious way to get line-level hit counts, something that AQT and a number of other profilers provide, and I've found very useful for examining rate of branch-selection among other things. In net, I've enjoyed using AQTime as well, and I've found their support to be really responsive. Again, I have to qualify my recommendation: A lot of their features don't work that well, and some of them are downright crash-prone on Win7x64. XPerf also performed admirably, but is agonizingly slow for the sampling detail required to get good reads on certain kinds of applications.

Right now, I'd have to say that I don't think there's a definitive option for profiling C++ code in a W7x64 environment, but there are certainly options that simply fail to perform any useful service.

解决方案

First:

Time sampling profilers are more robust than CPU sampling profilers. I'm not extremely familiar with Windows development tools so I can't say which ones are which. Most profilers are CPU sampling.

A CPU sampling profiler grabs a stack trace every N instructions.
This technique will reveal portions of your code that are CPU bound. Which is awesome if that is the bottle neck in your application. Not so great if your application threads spend most of their time fighting over a mutex.

A time sampling profiler grabs a stack trace every N microseconds.
This technique will zero in on "slow" code. Whether the cause is CPU bound, blocking IO bound, mutex bound, or cache thrashing sections of code. In short what ever piece of code is slowing your application will standout.

So use a time sampling profiler if at all possible especially when profiling threaded code.

Second:

Sampling profilers generate gobs of data. The data is extremely useful, but there is often too much to be easily useful. A profile data visualizer helps tremendously here. The best tool I've found for profile data visualization is gprof2dot. Don't let the name fool you, it handles all kinds of sampling profiler output (AQtime, Sleepy, XPerf, etc). Once the visualization has pointed out the offending function(s), jump back to the raw profile data to get better hints on what the real cause is.

The gprof2dot tool generates a dot graph description that you then feed into a graphviz tool. The output is basically a callgraph with functions color coded by their impact on the application.

A few hints to get gprof2dot to generate nice output.

  • I use a --skew of 0.001 on my graphs so I can easily see the hot code paths. Otherwise the int main() dominates the graph.
  • If you're doing anything crazy with C++ templates you'll probably want to add --strip. This is especially true with Boost.
  • I use OProfile to generate my sampling data. To get good output I need configure it to load the debug symbols from my 3rd party and system libraries. Be sure to do the same, otherwise you'll see that CRT is taking 20% of your application's time when what's really going on is malloc is trashing the heap and eating up 15%.

这篇关于超越堆栈采样:C ++ Profilers的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆