为什么/何时不想在JVM中启用Java 8 UseStringDeduplication? [英] Why/When you would not want to have Java 8 UseStringDeduplication enabled in JVM?

查看:529
本文介绍了为什么/何时不想在JVM中启用Java 8 UseStringDeduplication?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Java 8引入了字符串重复数据删除功能,该功能可以通过使用-XX:+UseStringDeduplication选项启动JVM来启用,该功能允许通过引用相似的String对象而不保存重复项来节省一些内存.当然,根据Strings的使用情况,它的有效性因程序而异,但是我可以肯定地说,对于大多数应用程序(如果不是全部),它可以被认为是有益的,这使我对以下几件事感到好奇:

Java 8 introduced String Deduplication that can be enabled by launching JVM with -XX:+UseStringDeduplication option allowing to save some memory by referencing similar String objects instead of keeping duplicates. Of course it's effectiveness varies from program to program depending on utilisation of Strings but I think it is safe to say that in general it can be considered beneficial for most applications (if not all) making me wonder about few things:

为什么默认情况下不启用它?是因为重复数据删除相关的成本还是仅仅是因为G1GC仍被认为是新的?

Why is it not enabled by default? Is it because of costs associated with dedeuplication or simply because G1GC is still considered new?

是否存在(或可能存在)您不希望使用重复数据删除的边缘情况?

Are there (or could there be) any edge cases where you would not want to use deduplication?

推荐答案

字符串重复删除可能对有害的情况包括:

Cases where String de-duplication could be harmful include:

  • 有很多字符串,但是重复的可能性非常低:寻找重复项的时间开销和重复数据删除数据结构的空间开销不会得到补偿.
  • 存在重复的合理可能性,但是大多数字符串无论如何都会在几个GC周期 1 内死亡.如果无论如何都要对重复数据删除的字符串进行GC处理,那么重复数据删除的好处就不那么明显了.

  • There are lots of strings but a very low probability of duplicates: the time overhead of looking for duplicates and the space overhead of the de-duping data structure would not be repaid.
  • There is a reasonable probability of duplicates, but most strings die in within a couple of GC cycles1 anyway. The de-duplication is less beneficial if the de-duped strings were going to be GC'ed soon anyway.

(这与不能在第一个GC循环中幸存的字符串无关.GC甚至 try 都可以对已知为垃圾的字符串进行重复分析是没有意义的.)

(This is not about strings that don't survive the first GC cycle. It would make no sense for the GC to even try to de-dup strings that it knows to be garbage.)

我们只能推测Java团队为什么默认情况下不启用重复数据删除功能,但是他们可以根据您和我的情况做出合理的决定(即基于证据的决定).他们可以访问许多大型的实际应用程序进行基准测试/尝试优化的效果.他们可能还会在合作伙伴或客户组织中拥有类似的大型代码库,并且对效率有所关注……他们可以征询他们对早期访问版本中的优化是否按预期工作的反馈.

We can only speculate as to why the Java team didn't turn on de-duping by default, but they are in a much better position to make rational (i.e. evidence based) decisions on this that you and I. My understanding is that they have access to many large real-world applications for benchmarking / trying out the effects of optimizations. They may also have contacts in partner or customer organizations with similarly large code-bases and concerns about efficiency ... who they can ask for feedback on whether optimizations in an early access release work as expected.

1-这取决于StringDeduplicationAgeThreshold的值 JVM设置.默认为3,这意味着(大致)字符串必须保留3个次要集合或要考虑进行重复数据删除的主要集合.但是无论如何,如果对字符串进行了重复数据删除然后不久后发现它无法访问,则该字符串的重复数据删除开销将不会得到补偿.

1 - This depends on the value of the StringDeduplicationAgeThreshold JVM setting. This defaults to 3 meaning that (roughly) a string has to survive 3 minor collections or a major collection to be considered for de-duping. But anyhow, if a string is de-duped and then found to be unreachable shortly afterwards, the de-duping overheads will not be repaid for that string.

如果您询问何时应该考虑启用重复数据删除,我的建议是尝试一下,看看它是否对每个应用程序有帮助.但是您需要做一些应用程序级基准测试(这很费力!),以确保重复数据删除是有益的...

If you are asking when you should consider enabling de-duping, my advice would be to try it and see if it helps on a per-application basis. But you need to do some application-level benchmarking (which takes effort!) to be sure that the de-duping is beneficial ...

仔细阅读 JEP 192 也可以帮助您理解问题并做出判断有关如何将其应用于您的Java应用程序的信息.

A careful read of JEP 192 would also help you understand the issues, and make a judgment on how they might apply for your Java application.

这篇关于为什么/何时不想在JVM中启用Java 8 UseStringDeduplication?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆