带有YourKit的已分析应用,仍然无法识别CPU占用量 [英] Profiled app with YourKit, still can't identify the CPU hog

查看:92
本文介绍了带有YourKit的已分析应用,仍然无法识别CPU占用量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Java应用程序大部分时间都消耗100%的CPU(仙人掌和顶级监控程序表明)。我们启动了YourKit(它确认了CPU资源问题),并将java.net.SocketInputStream.read(byte [],int,int)标识为15%的时间的最大热点。我相信他们无法像SocketInputStream.read这样准确地测量执行阻塞IO的方法的CPU时间。

I've got a Java application that is consuming 100% of the CPU most of the time (as indicated by cacti and top monitoring). We fired up YourKit (which confirms the CPU resource issue) and it identifies java.net.SocketInputStream.read(byte[], int, int) as the biggest hot spot at 15% of time. I believe they aren't accurately measuring CPU time for methods that perform blocking IO like SocketInputStream.read would.

还有6个其他确定的热点,但它们所占的比例较小占总CPU时间的20%以上。都在5%-1%范围内。

There are 6 other identified hot spots, but they account for less than 20% of accounted for CPU time combined. all in the 5%-1% range.

所以我知道我有一个问题,我可以看到问题,YourKit也有,但是我离确定问题还很近实际的问题。

So I know I have a problem, I can see the problem, YourKit does too, but I am no closer to identifying the actual problem.

我对使用探查器非常陌生,很可能会丢失一些东西。有想法吗?

I am pretty new to using a profiler, and am most likely missing something. Any ideas?

编辑:Sean很好地介绍了使用系统内置的工具。如果我使用top和shift + h查看线程,它将显示7-15个线程中的任何位置,并且CPU利用率会跳跃。我不认为这是导致问题的任何一个线程,而是每个线程在某个时间执行的一段代码。

Sean makes a good point about using tools built into the system. If I use top and shift+h to view threads, it displays anywhere from 7-15 threads, and the CPU utilization jumps around. I don't believe it's any one thread that is causing the problem, rather it is a piece of code each thread executes at some time.

推荐答案

如果可以的话,我建议在Solaris机器上运行它。如果没有Solaris,请考虑在运行Open Solaris的情况下设置虚拟机。

I would recommend running this on a Solaris box if you can. If you don't have a Solaris box consider setting a Virtual Machine up with Open Solaris running on it.

Solaris提供了一个名为 prstat

Solaris offers a tool called prstat

Prstat的工作原理与大多数人最喜欢的顶部熟悉。重要的区别是prstat可以为您分解进程,并显示进程中的每个线程。

Prstat works much like top which most people are familiar with. The important difference is prstat can break the processes up for you and show each thread within a process.

对于您而言,用法是
prstat -L 0 1

For your case the usage would be prstat -L 0 1

与线程转储配对(最好在脚本中执行此操作),您可以将LWPID匹配在一起,以准确地确定哪个线程是CPU吞噬。

Paired with a thread dump (doing this in a script is preferred) you can match the LWPID together to find exactly which thread is the CPU hog.

以下是一个功能示例(我创建了一个小应用程序,用于poc的大循环)

Here is a functional example (I created a small app going in a big loop for poc)

Standard Top会向您显示类似于以下内容的

Standard Top will show you something like the following

 PID USERNAME NLWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
  924 username   10  59    0   31M   11M run      0:53 36.02% java

然后使用prstat使用以下命令

Then using prstat The following command was used

 prstat -L 0 1 | grep java > /export/home/username/Desktop/output.txt

以及prstat的输出

And the output from prstat

PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/LWPID    
924 username   31M   10M run     30    0   0:00:09  35% java/10
924 username   31M   10M sleep   59    0   0:00:00 0.8% java/3
924 username   31M   10M sleep   59    0   0:00:00 0.6% java/2
924 username   31M   10M sleep   59    0   0:00:00 0.3% java/1

与顶部不同,但是如果您注意到数据的右侧,则PROCESS / LWPID会告诉您Java进程中正在消耗CPU的确切线程。以轻量进程ID(lwpid)10运行的线程正在消耗35%的CPU。如前所述,如果将其与线程转储配对,则可以找到确切的线程。就我而言,这是线程转储的相关部分

This may not look much different then top, but if you notice to the right side of the data, the PROCESS/LWPID is telling you the exact thread within the java process which is consuming the CPU. the thread running with the light weight process id (lwpid) 10 is consuming 35% of the CPU. As I mentioned before, if you pair this with a thread dump, you can find the exact thread. For my case, this is the relevant portion of the thread dump

"Thread-0" prio=3 tid=0x08173800 nid=0xa runnable [0xc60fc000..0xc60fcae0]
   java.lang.Thread.State: RUNNABLE
    at java.util.Random.next(Random.java:139)
    at java.util.Random.nextInt(Random.java:189)
    at ConsumerThread.run(ConsumerThread.java:13)

在线程的顶部, nid 可以与LWPID匹配。 nid = 0xa(从十六进制转换时,十进制为10英寸)

On the top line of the thread, the nid can be matched to the LWPID. nid=0xa (which is 10 in dec when converted from Hex)

如果您可以将prstat和thread dump命令放在脚本中并在运行期间运行4-5次CPU使用率过高时,您将开始看到模式,并能够以此方式确定CPU使用率过高的原因。

If you can put the prstat and thread dump commands in a script and run it 4-5 times during high CPU usages you will begin to see patterns and able to determine the cause of your high CPU that way.

在我看来,我看到这种结果是由于长时间运行gc时间到LDAP连接配置错误而导致的。玩得开心:)

In my time, I have seen this result from long running gc times to a misconfiguration of an LDAP connection. Have fun :)

这篇关于带有YourKit的已分析应用,仍然无法识别CPU占用量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆