偶尔遇到很长的垃圾收集延迟,为什么? [英] Experiencing occasional long garbage collection delays, why?

查看:129
本文介绍了偶尔遇到很长的垃圾收集延迟,为什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我的应用程序要求没有GC需要2秒以上的时间,理想情况下小于100毫秒。



基于以前的一些建议,我尝试了以下命令行选项:

  java -XX:MaxGCPauseMillis = 100 -XX:NewRatio = 9 -XX:+ UseConcMarkSweepGC -XX:+ CMSIncrementalMode -XX:+ PrintGCDetails -XX:+ PrintGCTimeStamps -server -Xmx9g -Xms9g 

应用程序有大量的长期存储对象,它们保存在ConcurrentLinkedHashMap中。我看到了偶尔的长时间停顿,最糟糕的情况下长达10秒(这是下面GC日志的倒数第二次)!

这里是一些输出我得到:

  16938.968:[GC 16938.968:[ParNew:153343K-> 17022K(153344K),7.8608580 secs ] 6184328K-> 6122510K(9420160K)icms_dc = 7,7.8614100秒] [时间:用户= 0.63 sys = 0.01,实际= 7.86秒] 
16947.087:[GC 16947.087:[ParNew:153342K-> 17022K [时间:用户= 0.44 sys = 0.00,实际= 7.27秒]
16954.614:[GC 16954.614:[ParNew:153342K],7.2604030 secs] 6258830K-> 6198642K(9420160K)icms_dc = 7,7.2609780 secs] - > 17024K(153344K),8.4307620秒] 6334962K-> 6274625K(9420160K)icms_dc = 7,8.4313150秒] [时间:用户= 0.62 sys = 0.01,实际= 8.43秒]
16963.310:[GC 16963.310 :[ParNew:153344K-> 17023K(153344K),6.2588760secs] 6410945K-> 6350748K(9420160K)icms_dc = 7,6.2594290secs] [Times:user = 0.48sys = 0.01,real = 6.25secs]
16969.834:[GC 16969.834:[P arNew:153343K-> 17022K(153344K),6.0274280秒] 6487068K-> 6425868K(9420160K)icms_dc = 7,6.0279830秒] [时间:用户= 0.50 sys = 0.01,实际= 6.03秒]
16976.122: [GC 16976.123:[ParNew:153342K-> 17022K(153344K),11.7774620secs] 6562188K-> 6503030K(9420160K)icms_dc = 7,11.7780180secs] [Times:user = 0.43sys = 0.04,real = 11.78secs] b $ b 16988.164:[GC 16988.164:[ParNew:153342K-> 17024K(153344K),10.9477920 secs] 6639350K-> 6579928K(9420160K)icms_dc = 7,10.9483440sec] [Times:user = 0.37 sys = 0.02,real = 10.95秒]
16999.371:[GC 16999.372:[ParNew:153344K-> 17023K(153344K),9.8828360秒] 6716248K-> 6655886K(9420160K)icms_dc = 7,9.8833940秒] [Times:user = 0.42 [GC 17009.509:[ParNew:153343K-> 17023K(153344K),5.0699960sec] 6792206K-> 6727987K(9420160K)icms_dc = 7,5.0705660secs] [时间:用户= 0.55 sys = 0.01,实际= 5.07秒]
17014.838:[GC 17014.838:[ParNew:153343K-> 17023K(153344K),6.641175 0秒] 6864307K-> 6790974K(9420160K)icms_dc = 7,6.6417400秒] [时间:用户= 0.37 sys = 0.01,实际= 6.63秒]
17021.735:[GC 17021.735:[ParNew:153343K-> 17024K(153344K),8.0545970秒] 6927294K-> 6856409K(9420160K)icms_dc = 7,8.0551790秒] [时间:用户= 0.34 sys = 0.03,real = 8.05秒]
17030.052:[GC 17030.053:[ParNew :153344K-> 17023K(153344K),7.9756730秒] 6992729K-> 6922569K(9420160K)icms_dc = 7,7.9762530秒] [时间:用户= 0.34 sys = 0.01,实际= 7.98秒]
17038.398: GC 17038.398:[ParNew:153343K-> 17022K(153344K),12.9613300 secs] 7058889K-> 6990725K(9420160K)icms_dc = 7,12.9618850secs] [Times:user = 0.39sys = 0.01,real = 12.96secs]
17051.630:[GC 17051.630:[ParNew:153342K-> 17022K(153344K),6.8942910secs] 7127045K-> 7059607K(9420160K)icms_dc = 7,6.8948380secs] [Times:user = 0.56 sys = 0.02,real = 6.89秒]
17058.798:[GC 17058.798:[ParNew:153342K-> 17024K(153344K),10.0262190 secs] 7195927K-> 7126351K(9420160K)icms_d c = 7,10.0267860秒] [时间:用户= 0.37 sys = 0.01,实际= 10.02秒]
17069.096:[GC 17069.096:[ParNew:153344K-> 17023K(153344K),10.0419500 secs] 7262671K-> ; 7195002K(9420160K)icms_dc = 7,10.0425020秒] [时间:用户= 0.40 sys = 0.02,实际= 10.04秒]
17079.410:[GC 17079.410:[ParNew:153343K-> 17022K(153344K),13.5389040秒] 7331322K-> 7264275K(9420160K)icms_dc = 7,13.5394610秒] [时间:用户= 0.30 sys = 0.01,实际= 13.54秒]
17093.223:[GC 17093.224:[ParNew:153342K-> 17023K (153344K),10.5909450秒] 7400595K-> 7330446K(9420160K)icms_dc = 7,10.5915060秒] [时间:用户= 0.33 sys = 0.00,实际= 10.58秒]
17104.083:[GC17104.084:[ParNew: 153343K-> 17024K(153344K),5.8420210秒] 7466766K-> 7392173K(9420160K)icms_dc = 7,5.8425920秒] [时间:用户= 0.57 sys = 0.00,实际= 5.84秒]

我花了数小时浏览描述Java GC调优的各种网页,但没有人真的让我有能力解释上面的日志,并提出一个行动方案。根据我提供的日志提供的具体建议将不胜感激。



更新:通过以下问题:



机器具有16G的RAM,这里是从顶部的信息:
Mem:总共15483904k,使用15280084k,203820k免费,155684k缓冲区
交换:2031608k总计, 1347240k used,684368k free,3304044k cached



它是一个不同的运行,但这里是当前流程的最高输出:

  PID用户PR NI VIRT RES SHR S%CPU%MEM时间+命令
1016智能17 0 10.2g 6.5g 9464 S 1 44.2 10:24.32 java

更新2:一些更详细的日志记录,看起来好像会导致400ms的延迟:

  {Heap before GC invocations = 1331(full 1):
par new generation total 153344K,used 153343K [0x00002aaaae200000 ,0x00002aaab8860000,0x00002aaab8860000)
eden空间使用100%使用[0x00002aaaae200000,0x00002aaab6720000,0x00002aaab6720000)
从空间17024K使用99%使用[0x00002aaab77c0000,0x00002aaab885fff0,0x00002aaab8860000)
使用空间17024K,使用0%[0x00002aaab6720000,0x00002aaab6720000,0x00002aaab77c0000)
并发标记扫描生成总计7169664K,使用4258496K [0x00002aaab8860000,0x00002aac6e200000,0x00002aac6e200000)
并发标记扫描转换生成总计21248K,使用13269K [0x00002aac6e200000,0x00002aac6f6c0000,0x00002aac73600000)
484.738:[GC 484.738:[ParNew:153343K-> 17022K(153344K),0.3950480秒] 4411840K-> 4341689K(7323008K),0.3954820秒] [时间:用户= 0.49 sys = 0.07,实际= 0.40秒]
堆后GC invocations = 1332(full 1):
par new generation total 153344K,used 17022K [0x00002aaaae200000,0x00002aaab8860000,0x00002aaab8860000)
eden space 136320K,0%used [0x00002aaaae200000,0x00002aaaae200000,0x00002aaab6720000)
来自太空17024K,99%使用[0
到空间17024K,使用0%[0x00002aaab77c0000,0x00002aaab77c0000,0x00002aaab8860000)
并发标记扫描生成总计7169664K,使用4324666K [0x00002aaab8860000,0x00002aac6e200000,0x00002aac6e200000)
并发标记扫描生成总计-mark-sweep perm gen total 21248K,used 13269K [0x00002aac6e200000,0x00002aac6f6c0000,0x00002aac73600000)
}


原来问题是堆被换出到磁盘,并且延迟是Java的GC不得不等待它被换回。



通过将Linux的swappiness参数设置为0解决(主要)。


I'm having a hard time dealing with a Java garbage collection problem, and interpreting the logs.

My application requires that no GC takes longer than 2 seconds, and ideally less than 100ms.

Based on some previous advice I'm trying the following command line options:

 java -XX:MaxGCPauseMillis=100 -XX:NewRatio=9 -XX:+UseConcMarkSweepGC  -XX:+CMSIncrementalMode -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -server -Xmx9g -Xms9g 

The application has a large amount of long-term stored objects, which are kept in a ConcurrentLinkedHashMap. I'm seeing occasional long pauses, in the worst case up to 10 seconds (this is the second last like of the GC logs below)!

Here is some of the output I'm getting:

16938.968: [GC 16938.968: [ParNew: 153343K->17022K(153344K), 7.8608580 secs] 6184328K->6122510K(9420160K) icms_dc=7 , 7.8614100 secs] [Times: user=0.63 sys=0.01, real=7.86 secs] 
16947.087: [GC 16947.087: [ParNew: 153342K->17022K(153344K), 7.2604030 secs] 6258830K->6198642K(9420160K) icms_dc=7 , 7.2609780 secs] [Times: user=0.44 sys=0.00, real=7.27 secs] 
16954.614: [GC 16954.614: [ParNew: 153342K->17024K(153344K), 8.4307620 secs] 6334962K->6274625K(9420160K) icms_dc=7 , 8.4313150 secs] [Times: user=0.62 sys=0.01, real=8.43 secs] 
16963.310: [GC 16963.310: [ParNew: 153344K->17023K(153344K), 6.2588760 secs] 6410945K->6350748K(9420160K) icms_dc=7 , 6.2594290 secs] [Times: user=0.48 sys=0.01, real=6.25 secs] 
16969.834: [GC 16969.834: [ParNew: 153343K->17022K(153344K), 6.0274280 secs] 6487068K->6425868K(9420160K) icms_dc=7 , 6.0279830 secs] [Times: user=0.50 sys=0.01, real=6.03 secs] 
16976.122: [GC 16976.123: [ParNew: 153342K->17022K(153344K), 11.7774620 secs] 6562188K->6503030K(9420160K) icms_dc=7 , 11.7780180 secs] [Times: user=0.43 sys=0.04, real=11.78 secs] 
16988.164: [GC 16988.164: [ParNew: 153342K->17024K(153344K), 10.9477920 secs] 6639350K->6579928K(9420160K) icms_dc=7 , 10.9483440 secs] [Times: user=0.37 sys=0.02, real=10.95 secs] 
16999.371: [GC 16999.372: [ParNew: 153344K->17023K(153344K), 9.8828360 secs] 6716248K->6655886K(9420160K) icms_dc=7 , 9.8833940 secs] [Times: user=0.42 sys=0.01, real=9.88 secs] 
17009.509: [GC 17009.509: [ParNew: 153343K->17023K(153344K), 5.0699960 secs] 6792206K->6727987K(9420160K) icms_dc=7 , 5.0705660 secs] [Times: user=0.55 sys=0.01, real=5.07 secs] 
17014.838: [GC 17014.838: [ParNew: 153343K->17023K(153344K), 6.6411750 secs] 6864307K->6790974K(9420160K) icms_dc=7 , 6.6417400 secs] [Times: user=0.37 sys=0.01, real=6.63 secs] 
17021.735: [GC 17021.735: [ParNew: 153343K->17024K(153344K), 8.0545970 secs] 6927294K->6856409K(9420160K) icms_dc=7 , 8.0551790 secs] [Times: user=0.34 sys=0.03, real=8.05 secs] 
17030.052: [GC 17030.053: [ParNew: 153344K->17023K(153344K), 7.9756730 secs] 6992729K->6922569K(9420160K) icms_dc=7 , 7.9762530 secs] [Times: user=0.34 sys=0.01, real=7.98 secs] 
17038.398: [GC 17038.398: [ParNew: 153343K->17022K(153344K), 12.9613300 secs] 7058889K->6990725K(9420160K) icms_dc=7 , 12.9618850 secs] [Times: user=0.39 sys=0.01, real=12.96 secs] 
17051.630: [GC 17051.630: [ParNew: 153342K->17022K(153344K), 6.8942910 secs] 7127045K->7059607K(9420160K) icms_dc=7 , 6.8948380 secs] [Times: user=0.56 sys=0.02, real=6.89 secs] 
17058.798: [GC 17058.798: [ParNew: 153342K->17024K(153344K), 10.0262190 secs] 7195927K->7126351K(9420160K) icms_dc=7 , 10.0267860 secs] [Times: user=0.37 sys=0.01, real=10.02 secs] 
17069.096: [GC 17069.096: [ParNew: 153344K->17023K(153344K), 10.0419500 secs] 7262671K->7195002K(9420160K) icms_dc=7 , 10.0425020 secs] [Times: user=0.40 sys=0.02, real=10.04 secs] 
17079.410: [GC 17079.410: [ParNew: 153343K->17022K(153344K), 13.5389040 secs] 7331322K->7264275K(9420160K) icms_dc=7 , 13.5394610 secs] [Times: user=0.30 sys=0.01, real=13.54 secs] 
17093.223: [GC 17093.224: [ParNew: 153342K->17023K(153344K), 10.5909450 secs] 7400595K->7330446K(9420160K) icms_dc=7 , 10.5915060 secs] [Times: user=0.33 sys=0.00, real=10.58 secs] 
17104.083: [GC 17104.084: [ParNew: 153343K->17024K(153344K), 5.8420210 secs] 7466766K->7392173K(9420160K) icms_dc=7 , 5.8425920 secs] [Times: user=0.57 sys=0.00, real=5.84 secs] 

I've spent hours pouring over the various webpages that describe Java GC tuning, but none have really given me the ability to interpret the logs above and come up with a course of action. Any specific advice based on the logs I've provided would be greatly appreciated.

Update: Per a question below:

The machine has 16G of RAM, here is the info from top: Mem: 15483904k total, 15280084k used, 203820k free, 155684k buffers Swap: 2031608k total, 1347240k used, 684368k free, 3304044k cached

Its a different run, but here is the current top output for the process:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  1016 sanity   17   0 10.2g 6.5g 9464 S    1 44.2  10:24.32 java                      

Update 2: Some more detailed logging, this looked like it caused a 400ms delay:

{Heap before GC invocations=1331 (full 1):
 par new generation   total 153344K, used 153343K [0x00002aaaae200000, 0x00002aaab8860000, 0x00002aaab8860000)
  eden space 136320K, 100% used [0x00002aaaae200000, 0x00002aaab6720000, 0x00002aaab6720000)
  from space 17024K,  99% used [0x00002aaab77c0000, 0x00002aaab885fff0, 0x00002aaab8860000)
  to   space 17024K,   0% used [0x00002aaab6720000, 0x00002aaab6720000, 0x00002aaab77c0000)
 concurrent mark-sweep generation total 7169664K, used 4258496K [0x00002aaab8860000, 0x00002aac6e200000, 0x00002aac6e200000)
 concurrent-mark-sweep perm gen total 21248K, used 13269K [0x00002aac6e200000, 0x00002aac6f6c0000, 0x00002aac73600000)
484.738: [GC 484.738: [ParNew: 153343K->17022K(153344K), 0.3950480 secs] 4411840K->4341689K(7323008K), 0.3954820 secs] [Times: user=0.49 sys=0.07, real=0.40 secs] 
Heap after GC invocations=1332 (full 1):
 par new generation   total 153344K, used 17022K [0x00002aaaae200000, 0x00002aaab8860000, 0x00002aaab8860000)
  eden space 136320K,   0% used [0x00002aaaae200000, 0x00002aaaae200000, 0x00002aaab6720000)
  from space 17024K,  99% used [0x00002aaab6720000, 0x00002aaab77bfb68, 0x00002aaab77c0000)
  to   space 17024K,   0% used [0x00002aaab77c0000, 0x00002aaab77c0000, 0x00002aaab8860000)
 concurrent mark-sweep generation total 7169664K, used 4324666K [0x00002aaab8860000, 0x00002aac6e200000, 0x00002aac6e200000)
 concurrent-mark-sweep perm gen total 21248K, used 13269K [0x00002aac6e200000, 0x00002aac6f6c0000, 0x00002aac73600000)
}

解决方案

Turns out the problem was that the heap was getting swapped out to disk, and the delay was Java's GC having to wait for it to be swapped back in.

Solved (mostly) by setting Linux' "swappiness" parameter to 0.

这篇关于偶尔遇到很长的垃圾收集延迟,为什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆