使用perf_events的nodejs/v8 Flamegraph中的未知事件 [英] Unknown events in nodejs/v8 flamegraph using perf_events

查看:218
本文介绍了使用perf_events的nodejs/v8 Flamegraph中的未知事件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试按照Brendan Gregg的描述使用Linux perf_events进行一些nodejs分析

I try to do some nodejs profiling using Linux perf_events as described by Brendan Gregg here.

工作流程如下:

  1. 使用--perf-basic-prof运行节点> 0.11.13,这将创建/tmp/perf-(PID).map文件,在该文件中写入JavaScript符号映射.
  2. 使用perf record -F 99 -p `pgrep -n node` -g -- sleep 30
  3. 捕获堆栈
  4. 使用stackcollapse-perf.pl脚本折叠存储库
  5. 使用flamegraph.pl脚本生成svg火焰图
  1. run node >0.11.13 with --perf-basic-prof, which creates /tmp/perf-(PID).map file where JavaScript symbol mapping are written.
  2. Capture stacks using perf record -F 99 -p `pgrep -n node` -g -- sleep 30
  3. Fold stacks using stackcollapse-perf.pl script from this repository
  4. Generate svg flame graph using flamegraph.pl script

我得到以下结果(开始时看起来非常不错):

I get following result (which look really nice at the beginning):

问题是有很多[unknown]元素,我想应该是我的nodejs函数调用.我假设整个过程在第3点处失败,在该处应该使用由--perf-basic-prof执行的node/v8生成的映射来折叠perf数据. /tmp/perf-PID.map文件已创建,并在节点执行期间将一些映射写入该文件.

Problem is that there are a lot of [unknown] elements, which I suppose should be my nodejs function calls. I assume that whole process fails somwhere at point 3, where perf data should be folded using mappings generated by node/v8 executed with --perf-basic-prof. /tmp/perf-PID.map file is created and some mapping are written to it during node execution.

如何解决这个问题?

我正在使用CentOS 6.5 x64,并且已经在节点0.11.13、0.11.14(包括预构建和编译)上进行了尝试,但均未成功.

I am using CentOS 6.5 x64, and already tried this with node 0.11.13, 0.11.14 (both prebuild, and compiled as well) with no success.

推荐答案

首先,"[未知]"的意思是采样器无法确定函数的名称,因为它是系统函数或库函数. 如果是这样,那就没关系-您不在乎,因为您正在寻找的是与您的时间有关的东西,而不是系统代码.

FIrst of all, what "[unknown]" means is the sampler couldn't figure out the name of the function, because it's a system or library function. If so, that's OK - you don't care, because you're looking for things responsible for time in your code, not system code.

实际上,我建议这是 XY问题 . 即使您直接回答了所要求的内容,也可能没什么用. 原因如下:

Actually, I'm suggesting this is one of those XY questions. Even if you get a direct answer to what you asked, it is likely to be of little use. Here are the reasons why:

1.在I/O绑定程序中,CPU分析几乎没有用

火焰图左侧的两个塔正在执行I/O,因此与右侧的大堆相比,它们花费的挂墙时间可能要多得多. 如果此火焰图是从墙面时间样本而不是CPU时间样本派生的,则它看起来更像下面的第二个图,它告诉您时间实际流向何处:

The two towers on the left in your flame graph are doing I/O, so they probably take a lot more wall-time than the big pile on the right. If this flame graph were derived from wall-time samples, rather than CPU-time samples, it could look more like the second graph below, which tells you where time actually goes:

在右边看起来像多汁的大堆已经缩小了,所以远没有那么重要. 另一方面,I/O塔非常宽. 如果可以避免某些I/O,那么任何一条橙色的宽条纹(如果在您的代码中)都表示可以节省很多时间.

What was a big juicy-looking pile on the right has shrunk, so it is nowhere near as significant. On the other hand, the I/O towers are very wide. Any one of those wide orange stripes, if it's in your code, represents a chance to save a lot of time, if some of the I/O could be avoided.

2.无论程序是受CPU约束还是受I/O约束,加速机会都可以轻松地从火焰图中隐藏

假设有些功能Foo确实在做一些浪费的事情,如果您知道它,就可以解决. 假设在火焰图中,它是深红色. 假设它在代码中的许多地方都被调用过,所以它并没有全部收集在火焰图的一个位置上. 相反,它出现在此处用黑色轮廓显示的多个小地方:

Suppose there is some function Foo that really is doing something wasteful, that if you knew about it, you could fix. Suppose in the flame graph, it is a dark red color. Suppose it is called from numerous places in the code, so it's not all collected in one spot in the flame graph. Rather it appears in multiple small places shown here by black outlines:

请注意,如果所有这些矩形都被收集,您会发现它占了11%的时间,这意味着值得一看. 如果您可以将时间缩短一半,则可以节省5.5%的总体成本. 如果实际上可以完全避免它在做什么,那么您可以总共节省11%. 这些小矩形中的每一个都会缩小到零,然后将其余的图随其拉到右侧.

Notice, if all those rectangles were collected, you could see that it accounts for 11% of time, meaning it is worth looking at. If you could cut its time in half, you could save 5.5% overall. If what it's doing could actually be avoided entirely, you could save 11% overall. Each of those little rectangles would shrink down to nothing, and pull the rest of the graph, to its right, with it.

现在,我将向您展示 我使用的方法 .我抽取了适量的随机堆栈样本,并对每个样本进行了检查,以查看可能会加速的例程. 相当于像这样在火焰图中取样:

Now I'll show you the method I use. I take a moderate number of random stack samples and examine each one for routines that might be speeded up. That corresponds to taking samples in the flame graph like so:

细长的垂直线表示二十个随机时间的堆栈样本. 如您所见,其中三个标记有 X . 这些是通过Foo进行的操作. 那是正确的数字,因为11%乘以20是2.2.

The slender vertical lines represent twenty random-time stack samples. As you can see, three of them are marked with an X. Those are the ones that go through Foo. That's about the right number, because 11% times 20 is 2.2.

(困惑?好吧,这对您来说可能性很小.如果您掷硬币20次,并且有11%的机会出现正面,您将获得多少正面?从技术上讲,它是二项分布.您将获得的可能数字是2,下一个最可能出现的数字是1和3.(如果您仅获得1,则继续操作直到获得2.)这是分布:)

(Confused? OK, here's a little probability for you. If you flip a coin 20 times, and it has a 11% chance of coming up heads, how many heads would you get? Technically it's a binomial distribution. The most likely number you would get is 2, the next most likely numbers are 1 and 3. (If you only get 1 you keep going until you get 2.) Here's the distribution:)

(两次见到Foo所需的平均样本数为2/0.11 = 18.2个样本.)

(The average number of samples you have to take to see Foo twice is 2/0.11 = 18.2 samples.)

查看这20个样本似乎有些艰巨,因为它们的深度在20到50个水平之间. 但是,您基本上可以忽略所有不是您自己的代码. 只需检查它们的您的代码即可. 您会确切地看到自己如何度过的时光, 这样您就可以很粗略地估算出多少 深筹码既是坏消息,也是好消息- 它们表示代码可能有很大的加速空间,可以告诉您这些是什么.

Looking at those 20 samples might seem a bit daunting, because they run between 20 and 50 levels deep. However, you can basically ignore all the code that isn't yours. Just examine them for your code. You'll see precisely how you are spending time, and you'll have a very rough measurement of how much. Deep stacks are both bad news and good news - they mean the code may well have lots of room for speedups, and they show you what those are.

您看到的任何东西都可以加快速度,如果您在多个样本上看到它,都会为您带来健康的加速. 您需要在多个样本上看到它的原因是,如果仅在一个样本上看到它,您只会知道它的时间不为零.如果您在一个以上的样本上看到它,您仍然不知道需要多少时间,但是您知道它并不小. 以下是统计信息.

Anything you see that you could speed up, if you see it on more than one sample, will give you a healthy speedup, guaranteed. The reason you need to see it on more than one sample is, if you only see it on one sample, you only know its time isn't zero. If you see it on more than one sample, you still don't know how much time it takes, but you do know it's not small. Here are the statistics.

这篇关于使用perf_events的nodejs/v8 Flamegraph中的未知事件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆