启用了JVM标志-XX:+ UseDynamicNumberOfGCThreads -XX:+ TraceDynamicGCThreads可以看到否. GC期间的GC线程数.请解释输出日志? [英] JVM flags -XX:+UseDynamicNumberOfGCThreads -XX:+TraceDynamicGCThreads enabled to see no. of GC threads during GC. Please explain the output logs?
问题描述
我们有一个应用程序以集群模式(6个节点)运行Wildfly应用程序服务器. 当触发GC时,有时我们会看到JVM冻结16秒. 如果15秒钟内未收到心跳响应,则该应用程序对时间敏感,并且群集中的其他节点认为该节点已死(JVM在其中暂停). 因此,JVM冻结导致应用程序不稳定. 为了了解GC期间发生的情况,我们启用了热点,安全点日志,并在GC暂停时查看以下跟踪.
We have an application running Wildfly app server in a clustered mode (6 nodes). We are seeing sometimes JVM freeze of 16secs when there is a GC triggered. The application is time sensitive and other nodes in the cluster thinks that node is dead (in which the JVM pause) if the heartbeat response is not received with in 15secs. So, The JVM freeze causing application instability. To understand what is going on during GC, we enabled hotspot, safepoint logs and see the following traces when there is a GC pause.
任何人都可以解释以下参数的含义.
Can anybody explain what is meant by the following parameters.
1.) active_workers(): 13
2.) new_acitve_workers: 13
3.) prev_active_workers: 13
4.) active_workers_by_JT: 3556
5.) active_workers_by_heap_size: 146
环境详细信息: Linux 64位RHEL 7 OpenJDK的1.8 堆大小:12GB(年轻:4GB,使用年限:8GB) CPU核心数:16 VMware ESX 5.1
Environment details: Linux 64bit RHEL 7 OpenJDK 1.8 Heap size: 12GB (Young:4GB, Tenure:8GB) CPU cores: 16 VMware ESX 5.1
JVM参数:
-XX:ThreadStackSize=512
-Xmx12288m
-XX:+UseParallelGC
-XX:+UseParallelOldGC
-XX:MaxPermSize=1024m
-XX:+DisableExplicitGC
-XX:NewSize=4096m
-XX:MaxNewSize=4096m
-XX:ReservedCodeCacheSize=256m
-XX:+UseCodeCacheFlushing
-XX:+UseDynamicNumberOfGCThreads
有没有关于调整这些JVM参数以减少GC暂停时间的建议?
Any suggestions in tuning these JVM parameters to reduce the GC pause time?
GC日志:
GCTaskManager::calc_default_active_workers() : active_workers(): 13 new_acitve_workers: 13 prev_active_workers: 13
active_workers_by_JT: 3556 active_workers_by_heap_size: 146
GCTaskManager::set_active_gang(): all_workers_active() 1 workers 13 active 13 ParallelGCThreads 13
JT: 1778 workers 13 active 13 idle 0 more 0
2016-10-06T07:38:47.281+0530: 48313.522: [Full GC (Ergonomics) DrainStacksCompactionTask::do_it which = 3 which_stack_index = 3/empty(0) use all workers 1
DrainStacksCompactionTask::do_it which = 7 which_stack_index = 7/empty(0) use all workers 1
DrainStacksCompactionTask::do_it which = 2 which_stack_index = 2/empty(0) use all workers 1
DrainStacksCompactionTask::do_it which = 0 which_stack_index = 0/empty(0) use all workers 1
DrainStacksCompactionTask::do_it which = 11 which_stack_index = 11/empty(0) use all workers 1
DrainStacksCompactionTask::do_it which = 6 which_stack_index = 6/empty(0) use all workers 1
DrainStacksCompactionTask::do_it which = 1 which_stack_index = 1/empty(0) use all workers 1
DrainStacksCompactionTask::do_it which = 12 which_stack_index = 12/empty(0) use all workers 1
DrainStacksCompactionTask::do_it which = 4 which_stack_index = 4/empty(0) use all workers 1
DrainStacksCompactionTask::do_it which = 5 which_stack_index = 5/empty(0) use all workers 1
DrainStacksCompactionTask::do_it which = 9 which_stack_index = 9/empty(0) use all workers 1
DrainStacksCompactionTask::do_it which = 8 which_stack_index = 8/empty(0) use all workers 1
DrainStacksCompactionTask::do_it which = 10 which_stack_index = 10/empty(0) use all workers 1
StealRegionCompactionTask::do_it region_stack_index 3 region_stack = 0x780be610 empty (1) use all workers 1
StealRegionCompactionTask::do_it region_stack_index 5 region_stack = 0x780be730 empty (1) use all workers 1
StealRegionCompactionTask::do_it region_stack_index 7 region_stack = 0x780be850 empty (1) use all workers 1
StealRegionCompactionTask::do_it region_stack_index 11 region_stack = 0x780bea90 empty (1) use all workers 1
StealRegionCompactionTask::do_it region_stack_index 1 region_stack = 0x780be4f0 empty (1) use all workers 1
StealRegionCompactionTask::do_it region_stack_index 10 region_stack = 0x780bea00 empty (1) use all workers 1
StealRegionCompactionTask::do_it region_stack_index 8 region_stack = 0x780be8e0 empty (1) use all workers 1
StealRegionCompactionTask::do_it region_stack_index 4 region_stack = 0x780be6a0 empty (1) use all workers 1
StealRegionCompactionTask::do_it region_stack_index 0 region_stack = 0x780be460 empty (1) use all workers 1
StealRegionCompactionTask::do_it region_stack_index 2 region_stack = 0x780be580 empty (1) use all workers 1
StealRegionCompactionTask::do_it region_stack_index 6 region_stack = 0x780be7c0 empty (1) use all workers 1
StealRegionCompactionTask::do_it region_stack_index 12 region_stack = 0x780beb20 empty (1) use all workers 1
StealRegionCompactionTask::do_it region_stack_index 9 region_stack = 0x780be970 empty (1) use all workers 1
[PSYoungGen: 63998K->0K(4082176K)] [ParOldGen: 8346270K->3657870K(8388608K)] 8410268K->3657870K(12470784K), [Metaspace: 465864K->465775K(1495040K)], 16.0898939 secs]
[Times: user=180.57 sys=2.46, real=16.09 secs]
2016-10-06T07:39:03.373+0530: 48329.615: Total time for which application threads were stopped: 16.2510644 seconds, Stopping threads took: 0.0036805 seconds
安全点日志:
48313.363: ParallelGCFailedAllocation [ 2384 0 2 ] [ 0 0 3 35 16210 ] 0
预先感谢您的帮助.
推荐答案
根据ParallelGCFailedAllocation
和[PSYoungGen: 63998K->0K(4082176K)] [ParOldGen: 8346270K->3657870K(8388608K)] 8410268K->3657870K(12470784K), [Metaspace: 465864K->465775K(1495040K)], 16.0898939 secs]
判断,我们具有以下条件:
Judging by the ParallelGCFailedAllocation
and [PSYoungGen: 63998K->0K(4082176K)] [ParOldGen: 8346270K->3657870K(8388608K)] 8410268K->3657870K(12470784K), [Metaspace: 465864K->465775K(1495040K)], 16.0898939 secs]
we have the following conditions:
- YoungGen几乎是空的(4G中仅占63М)
- OldGen几乎已满(8,3G中仅剩42М)
- JVM试图从YoungGen中移走幸存的对象,或者未能在Survivor空间中分配它们,并决定将其移至OldGen
- OldGen也没有足够的空间(如上所述只有42M),因此触发了FullGC
- 收集完OldGen的FullGC 5G后(8346270K-> 3657870K)
甚至13个并行运行的GC线程都在收集16G的5G线程.由于您只有16个内核,因此添加更多线程不会有太多提高速度的空间.
Even 13 GC threads running in parallel are collecting those 5G for 16 seconds. Since you have only 16 cores there not so many room for speed improvement from adding more threads.
以下可能正在发生:
- 您的对象对于一个YounGen来说寿命太长了,因此您需要切换到CMS/G1,这样它才能更频繁地收集OldGen,并且总花费的时间更少.您将需要根据需要调整
InitiatingHeapOccupancyPercent
.从当前的输出来看,您应该在4G左右启动.尽管是否真的需要那些12G的堆会引起疑问,因为这会遇到堆碎片的问题. - 您的对象寿命短,但太大而无法容纳在Survivor空间中.,因此您需要调整
SurvivorRatio
参数以使其更大.诸如SurvivorRatio = 4之类的东西(在这种情况下,它将变为1G).
- Your objects live too long for a YounGen, therefore you would need to switch to CMS/G1 so it will collect OldGen more frequently and it will take less time in total. You would need to tune
InitiatingHeapOccupancyPercent
as per your needs. Jugding by current output you should initiate somewhere around 4G. Though it will put in question if you really need those 12G of heap, since it would be a subject to heap fragmentation issues. - Your objects are short-lived but too big to be accomodated in Survivor space, therefore you would need to tune
SurvivorRatio
parameter to make it bigger. Something like SurvivorRatio=4 (it will make it 1G in that case).
因此,这实际上取决于您的对象分配模式.最好的方法是在将其应用于生产之前尝试在其他地方使用.
So it really depends on your object allocation pattern. The best approach would be to try it somewhere else before applying it in production.
这篇关于启用了JVM标志-XX:+ UseDynamicNumberOfGCThreads -XX:+ TraceDynamicGCThreads可以看到否. GC期间的GC线程数.请解释输出日志?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!