Java ConcurrentMarkSweep垃圾收集器不会清除所有垃圾 [英] Java ConcurrentMarkSweep garbage collector not removing all garbage

查看:227
本文介绍了Java ConcurrentMarkSweep垃圾收集器不会清除所有垃圾的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

简短形式:CMS垃圾收集器似乎无法收集日益增多的垃圾;最终,我们的JVM填满了,应用程序变得没有响应。通过外部工具强制执行GC(JConsole或 jmap -histo:live )会清除一次。

更新:问题似乎与JConsole的JTop插件有关;如果我们不运行JConsole,或者没有使用JTop插件运行,行为就会消失。



<技术说明:我们正在运行Sun JDK 1.6.0_07, 32位,在Linux 2.6.9盒子上升级JDK版本并不是一个真正的选择,除非有一个不可避免的主要原因,另外,我们的系统没有连接到可以访问因特网的机器,所以JConsole的截图, etc)不是一个选项。)



我们目前正在运行我们的JVM,并带有以下标志:

  -server -Xms3072m -Xmx3072m -XX:NewSize = 512m -XX:MaxNewSize = 512m 
-XX:+ UseConcMarkSweepGC -XX:+ UseParNewGC -XX:+ CMSParallelRemarkEnabled
-XX:CMSInitiatingOccupancyFraction = 70
-verbose:GC -XX:+ PrintGCDetails -XX:+ PrintGCTimeStamps
-XX:+ DisableExplicitGC

观察JConsole中的内存图,在应用程序使用寿命的前几个小时内,每隔15分钟左右就会运行一次完整的GC;在每次完整的GC之后,仍有越来越多的内存在使用中。几个小时后,系统达到一个稳定的状态,在CMS旧版中大约有2GB的已用内存。



这听起来像是一个典型的内存泄漏,除非if我们使用任何强制完整GC的工具(在JConsole中击中收集垃圾按钮,或者运行 jmap -histo:live 等),旧的gen会突然降低〜500MB,并且我们的应用程序在接下来的几个小时内再次响应(在这段时间内,相同的模式会继续进行 - 在每次完整的GC之后,越来越多的旧版本已经满了。)

需要注意的一点是:在JConsole中,报告的ConcurrentMarkSweep GC计数将保持为0,直到我们使用jconsole / jmap / etc强制执行GC。



使用 jmap -histo jmap -histo:live 按顺序,我可以确定明显未收集的对象包含:


  • 几百万 HashMap s和的数组HashMap $ Entry (以1:1的比例)

  • 几百万 Vector s和Object数组(1:1的比例,大约与HashMaps的数量相同)
  • >
  • 几百万 HashSet Hashtable com.sun .jmx.remote.util.OrderClassLoader s,以及 Hashtable $ Entry 的数组(每个数字大约相同;约为HashMaps和Vectors的一半)



以下是GC输出的摘录;我对它们的解释似乎是,CMS GC在没有故障转移到停止世界GC的情况下被中止。我是否以某种方式误解了这个输出?有什么会导致这种情况的?



在正常运行时,CMS GC输出块看起来像这样:

  36301.827:[GC [1 CMS-初始标志:1856321K(2621330K)] 1879456K(3093312K),1.7634200秒] [时间:用户= 0.17 SYS = 0.00,真= 0.18秒] 
36303.638:[CMS-concurrent-mark-start]
36314.903:[CMS-concurrent-mark:7.804 / 11.264 secs] [Times:user = 2.13 sys = 0.06,real = 1.13 secs]
36314.903:[CMS-concurrent-preclean-start]
36314.963:[CMS-concurrent-preclean:0.037 / 0.060 secs] [Times:user = 0.01 sys = 0.00,real = 0.01 secs]
36314.963:[CMS-concurrent-abortable-preclean-start]
36315.195:[GC 36315.195:[ParNew:428092K-> 40832K(471872K),1.1705760 secs] 2284414K-> 1897153K(3093312K),1.1710560 secs] [times:user = 0.13 sys = 0.02,real = 0.12 secs]
CMS:由于时间36320.059中止预清除:[CMS-concurrent-abortable-preclean:0.844 / 5.095 secs] [Times:user = 0.74 sys = 0.05,real = 0.51秒]
36320.062:[GC [YG o [重新扫描(平行),1.54078550秒] 36321.603:[弱参考处理,0.0042640秒] [1 CMS-备注:1856321K(2621440K)] 2002488K(3093312K),1.5456150秒]时报:用户= 0.18 SYS = 0.03,实际= 0.15秒]
36321.608:CMS-并发扫描开始]
36324.650:CMS-并发扫描:2.686 / 3.042秒] [时间: UESR = 0.66 SYS = 0.02,真= 0.30秒]
36324.651:[CMS-并发重置启动]
36324.700:[CMS-并发重置:0.050 / 0.050秒] [时间:用户= 0.01 sys = 0.00,real = 0.01 secs]

就是这样;下一行将是下一个ParNew GC。



当我们使用jmap -histo:live强制GC时,我们改为:



'pre> 48004.088:[CMS-并发标志:8.012 / 8.647秒] [时间:用户= 1.15 SYS = 0.02,真= 0.87秒]
(并发模式中断)

紧跟着下面的125行:(一些GeneratedMethodAccessor,一些GeneratedSerializationConstructorAccessor,一些GeneratedConstructorAccessor等)

  [卸载类sun.reflect.GeneratedMethodAccessor3] 

后面跟着:

 :1911295K-> 562232K( $时间:用户= 1.55系统= 0.01,实时= 1.57秒] $ b $(每个用户= 1.55系统= 0.01,实际= 1.57秒)

在此先感谢!

解决方案 div>

com.sun.jmx.remote.util.OrderClassLoader用于JMX的远程图层d对代码的快速回顾表明,它们是作为JVM内部远程请求的解组过程的一部分而创建的。这些类加载器的生命周期将直接与被解组的事物的生命周期有关,因此一旦不再有任何对该事件的引用,就可以释放类加载器。



如果在这种情况下这些实例的存在是您使用JConsole检查JVM中的事件的直接结果,我不会感到惊讶。看起来他们只是作为正常操作的一部分被GC清理。



我想这可能是JMX实现中的一个错误(似乎不太可能出现在相对最新的JVM),或者您可能拥有一些自定义MBean,或者正在使用一些导致问题的自定义JMX工具。但最终,我怀疑OrderClassLoader可能是一个红鲱鱼,问题在于其他地方(破碎的GC或其他一些泄漏)。


Short form: The CMS garbage collector appears to be failing to collect an ever-increasing amount of garbage; eventually, our JVM fills up, and the application becomes unresponsive. Forcing a GC via an external tool (JConsole or jmap -histo:live) cleans it up once.

UPDATE: The problem appears to be related to the JTop plugin for JConsole; if we don't run JConsole, or run it without the JTop plugin, the behavior goes away.

(Technical notes: we're running Sun JDK 1.6.0_07, 32-bit, on a Linux 2.6.9 box. Upgrading the JDK version is not really an option, unless there's an unavoidable, major reason. Also, our system is not hooked up to an Internet-accessible machine, so screenshots of JConsole, etc aren't an option.)

We're currently running our JVM with the following flags:

-server -Xms3072m -Xmx3072m -XX:NewSize=512m -XX:MaxNewSize=512m 
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled 
-XX:CMSInitiatingOccupancyFraction=70 
-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps 
-XX:+DisableExplicitGC

Observing the memory graph in JConsole, there's a full GC that runs every ~15 minutes or so during the first several hours of our application's lifespan; after each full GC, there's more and more memory still in use. After a few hours, the system hits a steady state where there's approximately 2GB of used memory in the CMS old gen.

Which sounds like a classic memory leak, except that if we use any tool that forces a full GC (hitting the "collect garbage" button in JConsole, or running jmap -histo:live, etc), the old gen suddenly drops to ~500MB used, and our application becomes responsive again for the next several hours (during which time the same pattern continues - after each full GC, more and more of the old gen is full.)

One thing of note: in JConsole, the reported ConcurrentMarkSweep GC count will stay at 0 until we force a GC with jconsole/jmap/etc.

Using jmap -histo and jmap -histo:live in sequence, I am able to determine that the apparently uncollected objects consist of:

  • several million HashMaps and arrays of HashMap$Entry (in a 1:1 ratio)
  • several million Vectors and Object arrays (1:1 ratio, and about the same as the number of HashMaps)
  • several million HashSet, Hashtable, and com.sun.jmx.remote.util.OrderClassLoaders, as well as arrays of Hashtable$Entry (about the same number of each; about half as many as the HashMaps and Vectors)

There are some excerpts from the GC output below; my interpretation of them appears to be that the CMS GC is getting aborted without failing over to the stop-the-world GC. Am I misinterpreting this output somehow? Is there something that would cause that?

During the normal runtime, the CMS GC output blocks look about like this:

36301.827: [GC [1 CMS-initial-mark: 1856321K(2621330K)] 1879456K(3093312K), 1.7634200 secs] [Times: user=0.17 sys=0.00, real=0.18 secs]
36303.638: [CMS-concurrent-mark-start]
36314.903: [CMS-concurrent-mark: 7.804/11.264 secs] [Times: user=2.13 sys=0.06, real=1.13 secs]
36314.903: [CMS-concurrent-preclean-start]
36314.963: [CMS-concurrent-preclean: 0.037/0.060 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
36314.963: [CMS-concurrent-abortable-preclean-start]
36315.195: [GC 36315.195: [ParNew: 428092K->40832K(471872K), 1.1705760 secs] 2284414K->1897153K(3093312K), 1.1710560 secs] [Times: user=0.13 sys=0.02, real=0.12 secs]
CMS: abort preclean due to time 36320.059: [CMS-concurrent-abortable-preclean: 0.844/5.095 secs] [Times: user=0.74 sys=0.05, real=0.51 secs]
36320.062: [GC[YG occupancy: 146166 K (471872 K)]36320.062: [Rescan (parallel), 1.54078550 secs]36321.603: [weak refs processing, 0.0042640 secs] [1 CMS-remark: 1856321K(2621440K)] 2002488K(3093312K), 1.5456150 secs] [Times: user=0.18 sys=0.03, real=0.15 secs]
36321.608: [CMS-concurrent-sweep-start]
36324.650: [CMS-concurrent-sweep: 2.686/3.042 secs] [Times: uesr=0.66 sys=0.02, real=0.30 secs]
36324.651: [CMS-concurrent-reset-start]
36324.700: [CMS-concurrent-reset: 0.050/0.050 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]

and that's it; the next line will be the next ParNew GC.

When we force a GC using jmap -histo:live, we instead get:

48004.088: [CMS-concurrent-mark: 8.012/8.647 secs] [Times: user=1.15 sys=0.02, real=0.87 secs]
(concurrent mode interrupted)

followed by ~125 lines of the form below: (some GeneratedMethodAccessor, some GeneratedSerializationConstructorAccessor, some GeneratedConstructorAccessor, etc)

[Unloading class sun.reflect.GeneratedMethodAccessor3]

followed by:

: 1911295K->562232K(2621440K), 15.6886180 secs] 2366440K->562232K(3093312K), [CMS Perm: 52729K->51864K(65536K)], 15.6892270 secs] [Times: user=1.55 sys=0.01, real=1.57 secs]

Thanks in advance!

解决方案

com.sun.jmx.remote.util.OrderClassLoader is used in the remote'ing layer for JMX and a quick review of the code suggests they're created as part of the unmarshalling process for remote requests inside of the JVM. The lifetime of those classloaders will be directly related to the lifetime of the thing that was unmarshalled such that once there are no longer any references to that thing the classloader could be released.

I wouldn't be surprised if in this case the presence of these instances was a direct result of you using JConsole to examine the goings on in the JVM. And it looks like they'd just be cleaned up by GC as part of normal operation.

I guess it's possible there's a bug in the JMX implementation (seems unlikely in a relatively up-to-date JVM) or perhaps you have some custom MBeans or are using some custom JMX tools that are causing the problem. But ultimately, I'm suspecting the OrderClassLoader is probably a red-herring and the issue lies elsewhere (broken GC or some other leak).

这篇关于Java ConcurrentMarkSweep垃圾收集器不会清除所有垃圾的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆