调整垃圾回收以实现低延迟 [英] Tuning garbage collections for low latency

查看:151
本文介绍了调整垃圾回收以实现低延迟的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我正在寻找关于如何在年轻一代(相对于老一代)在低等待时间至关重要的环境下确定最佳年龄的论点。我自己的测试往往表明,当年轻一代相当大时,延迟是最低的(例如,-XX:NewRatio <3),但我无法调和这种直觉,认为年轻一代越大,应该花费的时间越多垃圾收集。



该应用程序在linux 64位上运行,jdk 6.



内存使用量大约为50兆字节对象在启动时加载(=数据高速缓存),并且从那里仅创建(许多)非常短暂的对象(平均寿命<1毫秒)。



一些垃圾收集周期需要超过10毫秒才能运行......与应用程序延迟相比,这看起来真的不成比例,这也是最大的几毫秒。 >

解决方案

对于一个生成大量短暂垃圾并且没有长期存活的应用程序,那么一个可以工作的方法是一个大堆,几乎所有它是年轻的基因,几乎所有的伊甸园和任何一次幸存YG收藏的东西。



例如(可以说你有一个32位jvm)




  • 3072M堆(Xms和Xmn)

  • 128M终身(即Xmn 2944m)

  • MaxTenuringThreshold = 1
  • SurvivorRatio = 190(即每个幸存者空间是YG的1/192)
    Lit TargetSurvivorRatio = 90即尽可能多地填充这些幸存者)



您将用于此设置的确切参数取决于您的稳定状态大小工作集是(即。每次收藏时有多少活着)。这里的想法显然违背了正常的堆大小规则,但是你没有一个以这种方式行事的应用程序。这个想法是,该应用程序主要是v短命垃圾和一些静态数据,所以设置jvm以便该静态数据快速生成,然后具有足够大的YG以至于不会收集v,因此通常会最小化暂停的频率。你需要反复旋转旋钮才能找出适合你的尺寸和尺寸。这与您每次收藏的暂停大小之间的平衡有多大。例如,您可能会发现更短但更频繁的YG暂停。

你没有说你的应用运行了多久,但是这里的目标是在应用的整个生命周期内完全没有终身收藏。这当然是不可能的,但它是值得的。



然而,这不仅仅是收集算法,对您而言很重要,它是分配内存的地方。 NUMA收集器(仅与吞吐量收集器兼容并且由UseNUMA交换机激活)利用观察结果,即对象通常由创建它的线程纯粹使用&从而相应地分配内存。我不确定它是基于Linux的,但它在Solaris上使用MPO(内存布局优化),一些GC人员博客上的一些细节



由于您使用的是64位jvm,因此请确保您使用的是CompressedOops。考虑到对象分配率(可能是某种科学库?)和生命周期,那么你应该考虑对象重用。 lib的一个例子是 javalution StackContext

最后值得注意的是,GC暂停并不是唯一的STW暂停,您可以使用 6u21早期访问权限构建,它​​对PrintGCApplicationStoppedTime和PrintGCApplicationConcurrentTime开关(有效地在全局安全点打印时间以及这些安全点之间的时间)有一些修复。您可以使用tracesafepointstatistics标志来了解是什么导致它需要一个安全点(也就是说没有任何线程正在执行字节码)。

I'm looking for arguments as to how best to size the young generation (with respect to the old generation) in an environment where low latency is critical.

My own testing tends to show that latency is lowest when the young generation is fairly large (eg. -XX:NewRatio <3), however I cannot reconcile this with the intuition that the larger the young generation the more time it should take to garbage collect.

The application runs on linux 64 bits, jdk 6.

Memory usage is about 50Megabytes of long-lived objects being loaded at startup (=data cache), and from there it's only (many) very short lived objects being created (with average lifespan < 1 milliseconds).

Some garbage collection cycle take more than 10 milliseconds to run... which looks really disproportionate compared with app latency, which is again a few millisecs at max.

解决方案

For an application that generates lots of short lived garbage and nothing long lived then one approach that can work is a big heap with nearly all of it young gen and nearly all of that eden and tenure anything that survives a YG collection more than once.

For example (lets say you had a 32bit jvm)

  • 3072M heap (Xms and Xmn)
  • 128M tenured (i.e. Xmn 2944m)
  • MaxTenuringThreshold=1
  • SurvivorRatio=190 (i.e. each survivor space is 1/192 of the YG)
  • TargetSurvivorRatio=90 (i.e. fill those survivors as much as possible)

The exact params you would use for this setup depend on what the steady state size of your working set is (i.e. how much is alive at the time of each collection). The thinking here obviously goes against the normal heap sizing rules but then you don't have an app that behaves in that way. The thinking is that the app is mostly v short lived garbage and a bit of static data so set the jvm up so that that static data gets into tenured quickly and then have a YG big enough that it doesn't get collected v often thus minimising the frequency of the pauses. You'd need to twiddle knobs repeatedly to work out what a good size is for you & how that balances against the size of the pause you get per collection. You might find shorter but more frequent YG pauses are achieveable for example.

You don't say how long your app runs for but the target here is to have no tenured collections at all for the life of the app. This may be impossible of course but it's worth aiming for.

However it's not just the collection algo that is important in your case, it is where the memory is allocated. The NUMA collector (only compatible with the throughput collector and activated with UseNUMA switch) makes use of the observation that an object is often uses purely by the thread that created it & thus allocates memory accordingly. I'm not sure what it is based on in linux but it uses MPO (memory placement optimisation) on Solaris, some details on one of the GC guys blogs

Since you're using 64bit jvm then make sure you're using CompressedOops as well.

Given that rate of object allocation (possibly some sort of science lib?) and lifetime then you should give some consideration to object reuse. One example of a lib doing this is the javalution StackContext

Finally it's worth noting that GC pauses are not the only STW pauses, you could run with the 6u21 early access build which has some fixes to the PrintGCApplicationStoppedTime and PrintGCApplicationConcurrentTime switches (that effectively print time at a global safepoint and time between those safepoints). You can use the tracesafepointstatistics flag to get some idea of what is causing it to need a safepoint (aka no byte code is being executed by any thread).

这篇关于调整垃圾回收以实现低延迟的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆