流程执行跟踪工具 [英] Process execution tracing tools

查看:214
本文介绍了流程执行跟踪工具的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在调查我们实验室服务器上的一个非常特殊的问题。每当我们在使用Citrix访问的64位SUSE SLES11安装的计算机上运行java程序时,它就会挂起。我在机器上有最新的更新,但它没有帮助。如果任何这些情况发生变化,它可以工作:32位操作系统,SLES10.2,通过Cygwin / Exceed访问和其他X应用程序,如xclock工作正常。

I am currently in the process of investigating a very peculiar problem on our lab servers. Whenever we run a java program on a machine with a 64-bit SUSE SLES11 installation that has been accessed with Citrix, it just hangs. I have the latest updates on the machine but it doesn't help. If any of these circumstances change, it works: 32-bit OS, SLES10.2, access via Cygwin/Exceed and other X applications such as xclock work fine.

这到目前为止看起来可能看起来像ServerFault问题,但我实际上正在寻找的是关于我可以用来追踪这个软件实际上在做什么的软件的建议。挂起的位置是FUTEX_WAIT(使用 strace 找到):

This might look like a ServerFault question so far, but what I'm actually looking for is suggestions on software I can use to trace what this software is actually doing. Where it hangs is on a "FUTEX_WAIT" (found by using strace):

futex(0x7f4e3eaab9e0, FUTEX_WAIT, 19686, NULL

光标刚好在NULL之后停留在跟踪中,并且只是无限期地停留在那里。我找到了一个以前的错误报告看起来与此问题有点类似,但情况非常不同。

The cursor just stops in the trace just after the NULL and just stays there indefinitely. I have found a previous bug report that looks a little similar to this problem, but the circumstances are very different.

更新:显然,futex_wait问题是内核/ libc锁定进程中奇怪的竞争条件的一个标志。我将不得不尝试使用更新的内核/ libc,看看是否有任何区别。

UPDATE: Apparently, futex_wait problems are a sign of strange race conditions in the kernel/libc locking up processes. I will have to try with a newer kernel/libc and see if either of that makes any difference.

UPDATE2:内核/ libc更改没有任何区别。设法启动jvisualvm并挂起一个可预测的外部JMX端口并连接到另一台机器的端口,此时我发现这个他为主要线程追踪:

UPDATE2: kernel/libc changes made no difference. Did manage to start up jvisualvm and hang it with a predictable external JMX port and connected to that from another machine at which point I found this in the thread trace for main:

Name: main
State: RUNNABLE
Total blocked: 0  Total waited: 0

Stack trace: 
sun.awt.X11GraphicsDevice.getDoubleBufferVisuals(Native Method)
sun.awt.X11GraphicsDevice.makeDefaultConfiguration(X11GraphicsDevice.java:208)
sun.awt.X11GraphicsDevice.getDefaultConfiguration(X11GraphicsDevice.java:182)
   - locked java.lang.Object@1c190c99
sun.awt.X11.XToolkit.<clinit>(XToolkit.java:92)
java.lang.Class.forName0(Native Method)
java.lang.Class.forName(Class.java:169)
java.awt.Toolkit$2.run(Toolkit.java:834)
java.security.AccessController.doPrivileged(Native Method)
java.awt.Toolkit.getDefaultToolkit(Toolkit.java:826)
   - locked java.lang.Class@308a1f38
org.openide.util.ImageUtilities.ensureLoaded(ImageUtilities.java:519)
org.openide.util.ImageUtilities.access$200(ImageUtilities.java:80)
org.openide.util.ImageUtilities$ToolTipImage.createNew(ImageUtilities.java:699)
org.openide.util.ImageUtilities.getIcon(ImageUtilities.java:487)
   - locked java.util.HashMap@3c07ae6d
org.openide.util.ImageUtilities.getIcon(ImageUtilities.java:361)
   - locked java.util.HashMap@1c4c94e5
org.openide.util.ImageUtilities.loadImage(ImageUtilities.java:139)
org.netbeans.core.startup.Splash.loadContent(Splash.java:262)
org.netbeans.core.startup.Splash$SplashComponent.<init>(Splash.java:344)
org.netbeans.core.startup.Splash.<init>(Splash.java:170)
org.netbeans.core.startup.Splash.getInstance(Splash.java:102)
org.netbeans.core.startup.Main.start(Main.java:301)
org.netbeans.core.startup.TopThreadGroup.run(TopThreadGroup.java:110)
java.lang.Thread.run(Thread.java:619)

尝试了jvisualvm中的死锁检测按钮,但发现没有死锁。

Tried the deadlock detection button in jvisualvm but it discovered no deadlocks.

目前正与Citrix Europe讨论此问题并向他们提供跟踪信息。如果问题得到解决,将更新此问题。

Currently talking to Citrix Europe about this problem and delivering traces to them. Will update this question if it gets solved.

更新3:此问题已追溯到Citrix,并已提交服务请求号60235154.似乎问题是要么目前在Java的某个地方或者在X11的Citrix实现中。

UPDATE 3: This problem has been traced to Citrix and has been submitted with service request number 60235154. Seems like the problem is either somewhere in Java or in the Citrix implementation of X11 at the moment.

推荐答案

ltrace跟踪共享库函数调用。这可以为您提供更高层次的事物视图。但它也可以比strace输出更多的输出,因为许多库函数(例如strcmp)不会导致系统调用。

ltrace traces shared-library function calls. That can give you a higher-level view of things. But it can also spew tons more output than strace, since many library functions (e.g. strcmp) don't result in system calls.

但是futex用于锁定,所以如果你陷入futex,你可能陷入僵局。或者你只是在看一个等待其他线程的线程。 ltrace / strace -f跟随clone / fork跟踪所有线程/所有子进程。

But futex is used for locking, so if you get stuck at futex, you probably deadlocked. Or you're just looking at one thread which is waiting for other threads. ltrace/strace -f follows clone/fork to trace all threads/all child processes.

在gdb中,有时 thread应用all< command> 对多线程进程很有用。例如主题应用所有bt

In gdb, sometimes thread apply all <command> is useful for multithreaded processes. e.g. thread apply all bt

这篇关于流程执行跟踪工具的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆