我的JBoss服务器在Linux上达到了100%的SYS CPU;是什么原因造成的? [英] My JBoss server hits 100% SYS CPU on Linux; what can cause this?

查看:241
本文介绍了我的JBoss服务器在Linux上达到了100%的SYS CPU;是什么原因造成的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们已经调试JBoss服务器问题已有相当一段时间了.经过大约10个小时的工作,服务器进入了100%CPU恐慌攻击并停滞不前.在这段时间内,您无法运行任何新程序,因此甚至无法kill -quit来获取堆栈跟踪.这些高100%SYS CPU负载会持续10-20秒,并每隔几分钟重复一次.

We've been debugging this JBoss server problem for quite a while. After about 10 hours of work, the server goes into 100% CPU panic attacks and just stalls. During this time you cannot run any new programs, so you can't even kill -quit to get a stack trace. These high 100% SYS CPU loads last 10-20 seconds and repeat every few minutes.

我们已经进行了一段时间了.我们怀疑它与GC有关,但无法通过较小的程序进行确认.我们正在使用-clientParNew GC在i386 32位,RHEL5和Java 1.5.0_10上运行.

We have been working on for quite a while. We suspect it has something to do with the GC, but cannot confirm it with a smaller program. We are running on i386 32bit, RHEL5 and Java 1.5.0_10 using -client and ParNew GC.

这是到目前为止我们尝试过的事情:

Here's what we have tried so far:

  1. 我们限制了CPU的亲和力,因此当出现高负载时,我们可以实际使用服务器.使用strace,我们看到SIGSEGV的无限循环,然后sig返回.

  1. We limited the CPU affinity so we can actually use the server when the high load hits. With strace we see an endless loop of SIGSEGV and then the sig return.

我们试图用Java程序重现此内容.的确,使用WeakHashMap或访问空指针时,SYS CPU%会升高.问题是fillStackTrace占用了大量用户CPU,这就是为什么我们从未达到100%SYS CPU的原因.

We tried to reproduce this with a Java program. It's true that SYS CPU% climbs high with WeakHashMap or when accessing null pointers. Problem was that fillStackTrace took a lot of user CPU% and that's why we never reached 100% SYS CPU.

我们知道,经过10个小时的压力后,GC变得疯狂了,完全GC有时需要5秒钟.因此,我们认为它与内存有关.

We know that after 10 hours of stress, GC goes crazy and full GC sometimes takes 5 seconds. So we assume it has something to do with memory.

jstack显示所有线程均被阻止.在这段时间内,pstack偶尔显示MarkSweep堆栈跟踪,因此我们也不能确定.发送SIGQUIT不会产生任何结果:Java在SYS%加载周期结束后就转储了堆栈跟踪.

jstack during that period showed all threads as blocked. pstack during that time, showed MarkSweep stack trace occasionally, so we can't be sure about this as well. Sending SIGQUIT yielded nothing: Java dumped the stack trace AFTER the SYS% load period was over.

我们现在正尝试用一小段代码重现此问题,以便我们可以询问Sun.

We're now trying to reproduce this problem with a small fragment of code so we can ask Sun.

如果您知道是什么原因引起的,请告诉我们.我们欢迎各种想法,但我们一无所知,欢迎任何想法:)

If you know what's causing it, please let us know. We're open to ideas and we are clueless, any idea is welcome :)

感谢您的宝贵时间.

推荐答案

感谢大家的帮助.

最终,我们将JDK 1.6(仅一半的Java服务器)升级到JDK 1.6,问题消失了.只是不要使用1.5.0.10:)

Eventually we upgraded (only half of the java servers,) to JDK 1.6 and the problem disappeared. Just don't use 1.5.0.10 :)

我们仅通过访问空指针来设法重现这些问题(增强了SYS而不是US的功能,并杀死了整个linux.)

We managed to reproduce these problems by just accessing null pointers (boosts SYS instead of US, and kills the entire linux.)

再次感谢大家.

这篇关于我的JBoss服务器在Linux上达到了100%的SYS CPU;是什么原因造成的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆