JVM OutOfMemory错误“死亡螺旋” (不是内存泄漏) [英] JVM OutOfMemory error "death spiral" (not memory leak)

查看:100
本文介绍了JVM OutOfMemory错误“死亡螺旋” (不是内存泄漏)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们最近一直在将许多应用程序从RedHat linux JDK1.6.0_03下运行到Solaris 10u8 JDK1.6.0_16(更高规格的机器),我们注意到了一个相当紧迫的问题:在某些负载下我们的JVM让自己变成了死亡螺旋并最终失去记忆。注意事项:

We have recently been migrating a number of applications from running under RedHat linux JDK1.6.0_03 to Solaris 10u8 JDK1.6.0_16 (much higher spec machines) and we have noticed what seems to be a rather pressing problem: under certain loads our JVMs get themselves into a "Death Spiral" and eventually go out of memory. Things to note:


  • 这是不是内存泄漏的情况。这些应用程序运行得很好(在一种情况下超过3年),并且在任何情况下都不确定内存不足错误。有时应用程序有效,有时候它们没有

  • 这是而不是我们转移到64位VM - 我们仍在运行32位

  • 在一个案例中,使用1.6.0_18上的最新G1垃圾收集器似乎已经解决了这个问题。另一方面,回到1.6.0_03已经有效了

  • 有时我们的应用程序会因为HotSpot SIGSEGV 错误而失败

  • 这会影响用Java和Scala编写的应用程序

  • this is not a case of a memory leak. These are applications which have been running just fine (in one case for over 3 years) and the out-of-memory errors are not certain in any case. Sometimes the applications work, sometimes they don't
  • this is not us moving to a 64-bit VM - we are still running 32 bit
  • In one case, using the latest G1 garbage collector on 1.6.0_18 seems to have solved the problem. In another, moving back to 1.6.0_03 has worked
  • Sometimes our apps are falling over with HotSpot SIGSEGV errors
  • This is affecting applications written in Java as well as Scala

最重要的一点是:行为表现在那些突然获得大量数据的应用程序中(通常通过TCP)。这就好像VM决定继续添加更多数据(可能会将其推进到TG),而不是在新闻空间上运行GC,直到它意识到它必须执行完整的GC,然后,尽管几乎所有内容都在VM是垃圾,它不知何故决定不收集它!

The most important point is this: the behaviour manifests itself in those applications which suddenly get a deluge of data (usually via TCP). It's as if the VM decides to keep adding more data (possibly progressing it to the TG) rather than running a GC on "newspace" until it realises that it has to do a full GC and then, despite practically everything in the VM being garbage, it somehow decides not to collect it!

这听起来很疯狂,但我只是看不出它是什么。你怎么能解释一个应用程序哪一分钟落在最大堆1Gb,下一个工作正常(当应用程序正在做完全相同的事情时,永远不会大约256M)

It sounds crazy but I just don't see what else it is. How else can you explain an app which one minute falls over with a max heap of 1Gb and the next works just fine (never going about 256M when the app is doing exactly the same thing)

所以我的问题是:


  1. 还有其他人观察过这种行为吗?

  2. 有没有人建议我如何调试JVM本身(而不是我的应用程序)?我如何证明这是一个虚拟机问题?

  3. 是否有任何VM专家论坛,我可以向VM的作者询问(假设他们不在SO上)? (我们没有支持合同)

  4. 如果这是VM最新版本中的错误,为什么没有人注意到它?

  1. Has anyone else observed this kind of behaviour?
  2. has anyone any suggestions as to how I might debug the JVM itself (as opposed to my app)? How do I prove this is a VM issue?
  3. Are there any VM-specialist forums out there where I can ask the VM's authors (assuming they aren't on SO)? (We have no support contract)
  4. If this is a bug in the latest versions of the VM, how come no-one else has noticed it?


推荐答案

有趣的问题。听起来像其中一个垃圾收集器在您的特定情况下运行不佳。

Interesting problem. Sounds like one of the garbage collectors works poorly on your particular situation.

您是否尝试更改正在使用的垃圾收集器?有很多GC选项,并找出哪些是最佳选择似乎是一种黑色艺术,但我想知道一个基本的改变是否适合你。

Have you tried changing the garbage collector being used? There are a LOT of GC options, and figuring out which ones are optimal seems to be a bit of a black art, but I wonder if a basic change would work for you.

我知道有一个服务器GC比默认的更好地工作。你在用它吗?

I know there is a "Server" GC that tends to work a lot better than the default ones. Are you using that?

螺纹GC(我认为是默认的)可能是你特殊情况下最差的,我注意到它往往会少得多机器忙时有侵略性。

Threaded GC (which I believe is the default) is probably the worst for your particular situation, I've noticed that it tends to be much less aggressive when the machine is busy.

有一点我注意到,通常需要两个GC才能说服Java真正拿走垃圾。我认为第一个往往取消链接一堆对象,第二个实际上删除它们。您可能想要做的是偶尔强制两个垃圾收集。这将导致重要的GC暂停,但我从未见过清理整个堆需要两个以上的情况。

One thing I've noticed, it often takes two GCs to convince Java to actually take out the trash. I think the first one tends to unlink a bunch of objects and the second actually deletes them. What you might want to do is occasionally force two garbage collections. This WILL cause a significant GC pause, but I've never seen a case where it took more than two to clean out the entire heap.

这篇关于JVM OutOfMemory错误“死亡螺旋” (不是内存泄漏)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆