G1GC的延迟问题 [英] Latencies issues which G1GC

查看:1292
本文介绍了G1GC的延迟问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正面临着使用G1GC算法持续增加GC停顿的时间。服务延迟会随着时间的推移而持续增长。
一旦发生这种情况,我重启我的服务,延迟恢复正常。启动后,延迟会再次持续增加。

I am facing a continues increase in GC pauses with G1GC algorithm. The service latencies continue to grow over time. Once this happens, i restart my service and the latencies go back to normal. Post startup, the latencies again continue to increase over time.

在启动时,服务延迟约为200毫秒,但在24小时内,它们会达到350毫秒,并继续以线性方式增加。

At the time of startup, the service latencies are around ~200ms, but within 24 hours, they go up to 350ms, and continue to increase in a linear fashion.

服务延迟的增加与GarbageCollection指标的增加相匹配。

The increase in service latencies match the increase in GarbageCollection metrics.

我在M4-2X大型EC2盒子上运行一个java应用程序(JDK-8),每盒有50个活动线程。服务运行在12GB堆。请求的平均延迟大约为250毫秒,传入请求的速率约为每箱每秒约20个。

I am running a java application (JDK-8) on M4-2X Large EC2 boxes with 50 active threads per box. Service runs at an 12GB heap. The average latency of a request is about 250ms, and the rate of incoming requests is about ~20 per second per box.

        <jvmarg line="-Xms12288M"/>
        <jvmarg line="-Xmx12288M"/>

        <jvmarg line="-verbose:gc" />
        <jvmarg line="-XX:+UseG1GC"/>
        <jvmarg line="-XX:+PrintGCDetails" />
        <jvmarg line="-XX:+PrintGCTimeStamps" />
        <jvmarg line="-XX:+PrintTenuringDistribution" />
        <jvmarg line="-XX:+PrintGCApplicationStoppedTime" />
        <jvmarg line="-XX:MaxGCPauseMillis=250"/>
        <jvmarg line="-XX:ParallelGCThreads=20" />
        <jvmarg line="-XX:ConcGCThreads=5" />
        <jvmarg line="-XX:-UseGCLogFileRotation"/>



GC日志



GC logs

79488.355: Total time for which application threads were stopped: 0.0005309 seconds, Stopping threads took: 0.0000593 seconds
79494.559: [GC pause (G1 Evacuation Pause) (young)
Desired survivor size 369098752 bytes, new threshold 15 (max 15)
- age   1:   64725432 bytes,   64725432 total
- age   2:    8867888 bytes,   73593320 total
- age   3:    2503592 bytes,   76096912 total
- age   4:     134344 bytes,   76231256 total
- age   5:    3729424 bytes,   79960680 total
- age   6:     212000 bytes,   80172680 total
- age   7:     172568 bytes,   80345248 total
- age   8:     175312 bytes,   80520560 total
- age   9:     282480 bytes,   80803040 total
- age  10:     160952 bytes,   80963992 total
- age  11:     140856 bytes,   81104848 total
- age  12:     153384 bytes,   81258232 total
- age  13:     123648 bytes,   81381880 total
- age  14:      76360 bytes,   81458240 total
- age  15:      63888 bytes,   81522128 total
, 2.5241014 secs]
   [Parallel Time: 2482.2 ms, GC Workers: 20]
      [GC Worker Start (ms): Min: 79494558.9, Avg: 79494567.4, Max: 79494602.1, Diff: 43.2]
      [Ext Root Scanning (ms): Min: 0.0, Avg: 140.9, Max: 2478.3, Diff: 2478.3, Sum: 2818.8]
      [Update RS (ms): Min: 0.0, Avg: 5.3, Max: 41.9, Diff: 41.9, Sum: 106.9]
         [Processed Buffers: Min: 0, Avg: 23.2, Max: 80, Diff: 80, Sum: 465]
      [Scan RS (ms): Min: 0.1, Avg: 0.2, Max: 0.4, Diff: 0.3, Sum: 4.1]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.4]
      [Object Copy (ms): Min: 0.0, Avg: 41.9, Max: 68.7, Diff: 68.7, Sum: 837.9]
      [Termination (ms): Min: 0.0, Avg: 2282.3, Max: 2415.8, Diff: 2415.8, Sum: 45645.3]
         [Termination Attempts: Min: 1, Avg: 21.5, Max: 68, Diff: 67, Sum: 430]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.2, Diff: 0.1, Sum: 1.0]
      [GC Worker Total (ms): Min: 2435.8, Avg: 2470.7, Max: 2482.0, Diff: 46.2, Sum: 49414.5]
      [GC Worker End (ms): Min: 79497037.9, Avg: 79497038.1, Max: 79497041.0, Diff: 3.1]
   [Code Root Fixup: 0.1 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.9 ms]
   [Other: 40.9 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 37.7 ms]
      [Ref Enq: 0.8 ms]
      [Redirty Cards: 0.4 ms]
      [Humongous Register: 0.1 ms]
      [Humongous Reclaim: 0.1 ms]
      [Free CSet: 1.3 ms]
   [Eden: 5512.0M(5512.0M)->0.0B(4444.0M) Survivors: 112.0M->128.0M Heap: 8222.2M(12.0G)->2707.5M(12.0G)]
 [Times: user=19.63 sys=0.18, real=2.53 secs]
79497.083: Total time for which application threads were stopped: 2.5252654 seconds, Stopping threads took: 0.0000914 seconds

我正在寻找GC配置的一些帮助。根据我的阅读,我计划将并行线程数增加到32,将G1HeapRegionSize设置为16M,并设置ConcGCThreads = 8.

I am looking for some help with GC configurations. On the basis of my reading, i am planning to increase the number of parallel threads to 32, set G1HeapRegionSize to 16M, and set ConcGCThreads = 8.

    Mixed   Concurrent Mark Remark  Cleanup initial-mark    Young GC    Total
Count   14  4   4   4   4   263 293
Total GC Time   4 sec 120 ms    0   1 sec 100 ms    70 ms   980 ms  1 min 8 sec 10 ms   1 min 14 sec 280 ms
Avg GC Time     294 ms  0   275 ms  17 ms   245 ms  259 ms  254 ms
Avg Time std dev    127 ms  0   73 ms   4 ms    73 ms   63 ms   79 ms
Min/Max Time    0 / 560 ms  0 / 0   0 / 400 ms  0 / 20 ms   0 / 340 ms  0 / 620 ms  0 / 620 ms
Avg Interval Time   2 min 55 sec 119 ms 12 min 32 sec 443 ms    12 min 32 sec 443 ms    12 min 32 sec 449 ms    12 min 32 sec 423 ms    13 sec 686 ms   51 sec 887 ms



GC原因



GC Causes

Cause   Count   Avg Time    Max Time    Total Time  Time %
G1 Evacuation Pause     263 259 ms  560 ms  1 min 8 sec 50 ms   91.61%
GCLocker Initiated GC   15  272 ms  400 ms  4 sec 80 ms 5.49%
Others  12  n/a n/a 1 sec 250 ms    1.68%
G1 Humongous Allocation     3   300 ms  340 ms  900 ms  1.21%
Total   293 n/a n/a 1 min 14 sec 280 ms 99.99%



终身总结



Tenuring summary

Desired Survivor Size: 448.0 mb,

Max Threshold: 15

Age Survival Count  Average size (kb)   Average Total 'To' size (kb)
age 1   281 54856.84    54856.84
age 2   273 32935.6 89227.65
age 3   258 29812.41    122175.68
age 4   235 28499.48    158266.46
age 5   214 27909.13    196528.23
age 6   192 26896.33    237892.45
age 7   180 25759.58    272516.81
age 8   174 23565.21    299092.37
age 9   166 21745.62    320927.73
age 10  149 19323.6 340228.24
age 11  125 17400.14    357569.6
age 12  96  13995.26    372030.12
age 13  55  10909.19    378053.14
age 14  38  10197.95    389146.13
age 15  22  5996.65 395657.37


推荐答案

第一点是

您需要检查应用程序中是否存在任何连接泄漏。

You need to check is there any connection leak in your application.

G1GC 中可以有一个参数,您可以分析:

But there can be one parameter in G1GC, which you can analyze :

Humongous Objects

此时,G1GC的大部分功能和架构已经被刷新,除了Humongous对象,最大的弱点/复杂性。如前所述,任何单个数据分配≥G1HeapRegionSize/ 2都被视为Humongous对象,它是从Free Space的连续区域分配的,然后将其添加到Tenured。让我们了解一些基本特征以及它们如何影响正常的GC生命周期。以下关于Humongous对象的讨论将提供对Humongous对象的缺点的深入了解,例如:

At this point the majority of the functionality and architecture of G1GC has been flushed out, with the exception of the biggest weakness/complexity, the Humongous object. As mentioned previously, any single data allocation ≥ G1HeapRegionSize/2 is considered a Humongous object, which is allocated out of contiguous regions of Free space, which are then added to Tenured. Let's run through some basic characteristics and how they affect the normal GC lifecycle. The following discussion on Humongous objects will provide insight into the downsides of Humongous objects such as:

Increase the risk of running out of Free space and triggering a Full GC
Increase overall time spent in STW

分配出大量的物品自由空间。分配失败会触发GC事件。如果来自可用空间的分配失败触发GC,则GC事件将是完整GC,这在大多数情况下是非常不合需要的。为了避免在具有大量Humongous对象的应用程序中发生完整GC事件,必须确保自由空间池足够大,与伊甸园相比,Eden将始终首先填满。一个人通常最终过于谨慎,应用程序最终处于Free ram pool非常庞大并且从未被充分利用的状态,这从定义上浪费了RAM。

Humongous objects are allocated out of Free space. Allocation failures trigger GC events. If an allocation failure from Free space triggers GC, the GC event will be a Full GC, which is very undesirable in most circumstances. To avoid Full GC events in an application with lots of Humongous objects, one must ensure the Free space pool is large enough as compared to Eden that Eden will always fill up first. One usually ends up being over cautious and the application ends up in a state where the Free ram pool is quite large and never fully utilized, which is by definition wasting RAM.

在MPCMC结束时释放了大量的对象

Humongous objects are freed at the end of an MPCMC

直到Oracle JDK 8u45左右,确实只在运行结束时收集了Humongous对象。 MPCMC。 Oracle 8u45-8u65版本的发行说明中有一些提交,表明在次要事件期间收集了一些但不是全部的Humongous对象。

所以,您可以尝试更新最新的java。

So, You can try by updating latest java.

只能在MPCMC结束时收集的大量物品会增加保留可用空间的要求,或者更有可能触发Full GC。

Humongous objects that are only collectible at the end of an MPCMC will increase the requirements for reserved Free space or be more likely to trigger a Full GC.

查找多少重要对象:

步骤1:在您的gc.log上运行以下命令

Step 1. : run following command on your gc.log

命令1:

grepsource:concurrent humongous allocation/ tmp / gc.log | sed的/.*分配请求:\([0-9] * \)bytes。* / \1 /'> humoungous_humongoud_size.txt

命令2:

awk -F',''{sum + = $ 1} END {print sum;}'humoungous_humongoud_size.txt

它会给你大小在我的应用程序中生成的大量对象。

It will give you the size of humongous objects generated in my application.

但最后,如果你的应用程序有内存泄漏,你必须解决这个问题。

But Lastly, if your application has memory leaks you have to solve that.

这篇关于G1GC的延迟问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆