为什么Java G1 gc花费这么多时间扫描RS? [英] Why is the Java G1 gc spending so much time scanning RS?

查看:585
本文介绍了为什么Java G1 gc花费这么多时间扫描RS?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在评估G1垃圾回收器以及它如何在我们的应用程序中执行。看着gc-log,我注意到很多收藏都有很长的Scan RS阶段:

  7968.869:[GC (ms):4030.4 4034.1 4032.0 4032.0 
平均:4032.1对象拷贝数(ms):6038.5 6033.3 6036.7 6037.1
平均值:6036.4,最小值:6033.3,最大值:6038.5,差值:5.2]
(...)
[Eden:19680M(19680M) - > 0B(20512M)幸存者:2688M-> 2624M堆:
75331M(111904M) - > 51633M(115744M)]
[Times:user = 40.49 sys = 0.02,real = 10.28 secs]

所有删除日志行条目显示运行时间在单位数毫秒。

我认为大部分时间应该花在复制,对吧? Scan RS需要这么长时间的原因是什么?有关如何调整G1设置的任何想法?
JVM以

  -Xms40960M -Xmx128G -XX:+ UseG1GC -verbose:gc -XX:+ PrintGCDetails -Xloggc:gc.log 

编辑:哦,我忘了...我正在使用Java 7月25日

更新:
我注意到另外两件奇怪的事情:

  16187.740:[GC并发标记开始] 
16203.934:[GC暂停(年轻),2.89871800秒]
(...)
16218.455 :[GC暂停(年轻),4.61375100秒]
(...)
16237.441:[GC暂停(年轻),4.46131800秒]
(...)
16257.785 :[GC暂停(年轻),4.73922600秒]
(...)
16275.417:[GC暂停(年轻),3.87863400秒]
(...)
16291.505 :[GC暂停(年轻),3.72626400秒]
(...)
16307.824:[GC暂停(年轻),3.72921700秒]
(...)
16325.851 :[GC暂停(年轻),3.91060700秒]
(...)
16354.600:[GC暂停(年轻),5.61306000秒]
(...)
16393.069 :[GC暂停(年轻),17.50453200秒]
(...)
1 6414.590:[GC并发标记结束,226.8497670秒]

运行正在执行中。我不确定这是否是有意的,但它似乎对我来说是错误的。无可否认,这是一个极端的例子,但我确实在我的日志中看到了这种行为。

另外一件事是我的JVM进程增长到了160克。考虑到128g的堆大小,这是一个相当大的开销。这是预料之中,还是G1泄漏内存?任何想法如何找到?



PS:我不确定我是否应该为更新提出新的问题...如果你们中的任何人认为这将是有益的,告诉我;)

更新2:
我猜G1真的可能是内存泄漏: http://printfdebugger.tumblr.com/post/19142660766/how-i-learned-to-love-cms-and-had-my-heart-broken-by-g1
因此现在是一个交易破坏者,我不会花更多的时间玩这个。
我还没试过的事情是配置区域大小(-XX:G1HeapRegionSize)并降低堆占用率(-XX:InitiatingHeapOccupancyPercent)。

解决方案

让我们来看看。



1 - 看起来你的GC配置为使用4个线程(或者你有4个vCPU,但不太可能考虑堆的大小)。对于一个128GB的堆来说这是相当低的,我期待的更多。



GC事件似乎发生在25+秒的时间间隔内。然而,您提供的日志摘录没有提及处理的区域数量。



=>您有没有指定暂停时间目标给G1GC( -XX:MaxGCPauseMillis = N )?


$ b 2 - 长时间扫描RSet时间



扫描RSet表示GC扫描记忆集时的时间。记住一组区域包含与指向该区域的参考相对应的卡片。这一阶段扫描那些寻找指向收集集合的所有区域的引用的卡片。

所以在这里,我们还有一个问题:



=>在该特定集合中处理了多少个区域(即CSet有多大)

3-长对象副本时间

顾名思义,复制时间是每个工作线程将收集集中区域的活动对象复制到另一个地区。



如此长的复印时间可能表明处理了很多区域,并且您可能希望减少该数量。它也可以建议交换,但这是不太可能的,因为你的用户/实际值在日志结尾。



4 - 现在要做什么

strong>



您应该在GC日志中检查已处理区域的数量。将此数字与您的区域大小关联起来,并推断已扫描的内存量。然后,您可以设置一个较小的暂停时间目标(例如,使用<$为500毫秒C $ C> -XX:MaxGCPauseMillis = 500 )。这将会增加GC事件的数量,减少每个GC所释放内存的数量。
$ b


  • 周期

  • 减少YGC期间的STW暂停



希望有帮助!



来源:


I'm currently evaluating the G1 garbage collector and how it performs for our application. Looking at the gc-log, I noticed a lot of collections have very long "Scan RS" phases:

7968.869: [GC pause (mixed), 10.27831700 secs]
   [Parallel Time: 10080.8 ms]
   (...)
      [Scan RS (ms):  4030.4  4034.1  4032.0  4032.0
       Avg: 4032.1, Min: 4030.4, Max: 4034.1, Diff:   3.7]
      [Object Copy (ms):  6038.5  6033.3  6036.7  6037.1
       Avg: 6036.4, Min: 6033.3, Max: 6038.5, Diff:   5.2]
   (...)
    [Eden: 19680M(19680M)->0B(20512M) Survivors: 2688M->2624M Heap:
     75331M(111904M)->51633M(115744M)]
 [Times: user=40.49 sys=0.02, real=10.28 secs] 

All the removed log-rows entries show runtimes in single-digit ms.

I think most of the time should be spent in copying, right? What could be the reason Scan RS takes so long? Any ideas on how to tweak the G1-settings? The JVM was started with

-Xms40960M -Xmx128G -XX:+UseG1GC -verbose:gc -XX:+PrintGCDetails -Xloggc:gc.log

Edit: Oh, I forgot... I'm using Java 7u25

Update: I noticed two other weird things:

16187.740: [GC concurrent-mark-start]
16203.934: [GC pause (young), 2.89871800 secs]
(...)
16218.455: [GC pause (young), 4.61375100 secs]
(...)
16237.441: [GC pause (young), 4.46131800 secs]
(...)
16257.785: [GC pause (young), 4.73922600 secs]
(...)
16275.417: [GC pause (young), 3.87863400 secs]
(...)
16291.505: [GC pause (young), 3.72626400 secs]
(...)
16307.824: [GC pause (young), 3.72921700 secs]
(...)
16325.851: [GC pause (young), 3.91060700 secs]
(...)
16354.600: [GC pause (young), 5.61306000 secs]
(...)
16393.069: [GC pause (young), 17.50453200 secs]
(...)
16414.590: [GC concurrent-mark-end, 226.8497670 sec]

The concurrent GC run is continuing while parallel runs are being performed. I'm not sure if that's intended, but it kinda seems wrong to me. Admittedly, this is an extreme example, but I do see this behaviour all over my log.

Another thing is that my JVM process grew to 160g. Considering a heap-size of 128g, that's a rather large overhead. Is this to be expected, or is G1 leaking memory? Any ideas on how to find that out?

PS: I'm not really sure if I should've made new questions for the updates... if any of you think that this would be beneficial, tell me ;)

Update 2: I guess the G1 really may be leaking memory: http://printfdebugger.tumblr.com/post/19142660766/how-i-learned-to-love-cms-and-had-my-heart-broken-by-g1 As this is a deal-breaker for now, I'm not going to spend more time on playing with this. Things I didn't yet try is configuring region size (-XX:G1HeapRegionSize) and lowering the heap occupancy (-XX:InitiatingHeapOccupancyPercent).

解决方案

Let's see.

1 - First clues

It looks like your GC was configured to use 4 threads (or you have 4 vCPU, but it is unlikely given the size of the heap). It is quite low for a 128GB heap, I was expecting more.

The GC events seems to happen at 25+ seconds interval. However, the log extract you gave do not mention the number of regions that were processed.

=> By any chance, did you specify pause time goals to G1GC (-XX:MaxGCPauseMillis=N) ?

2 - Long Scan RSet time

"Scan RSet" means the time the GC spent in scanning the Remembered Sets. Remembered Set of a region contains cards that correspond to the references pointing into that region. This phase scans those cards looking for the references pointing into all the regions of the collection set.

So here, we have one more question :

=> How many regions were processed during that particular collection (i.e. how big is the CSet)

3 - Long Object Copy time

The copy time, as the name suggest, is the time spend by each worker thread copying live objects from the regions in the Collection Set to the other regions.

Such long copy time can suggest that a lot of regions were processed, and that you may want to reduce that number. It could also suggest swapping, but this is very unlikely given your user/real values at the end of the log.

4 - Now what to do

You should check in the GC log the number of regions that were processed. Correlate this number with your region size and deduce the amount of memory that was scanned.

You can then set a smaller pause time goal (for instance, to 500ms using -XX:MaxGCPauseMillis=500). This will

  • increase the number of GC events,
  • reduce the amount of freed memory per GC cycle
  • reduce the STW pauses during YGC

Hope that helps !

Sources :

这篇关于为什么Java G1 gc花费这么多时间扫描RS?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆