不可变集和映射上的 JDK9 随机化 [英] JDK9 randomization on immutable sets and maps

查看:17
本文介绍了不可变集和映射上的 JDK9 随机化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

阅读这个问题Eugene 给出的答案,我发现 JDK9 不可变集和映射会引入随机源,影响它们的遍历.这意味着迭代顺序确实是随机的,至少在 JVM 的不同运行中是这样.

Reading this question and the answer given by Eugene, I found that JDK9 immutable sets and maps will introduce a source of randomness that will affect their traversal. This means that iteration order will indeed be random, at least among different runs of the JVM.

由于规范不保证集合和映射的任何遍历/迭代顺序,这绝对没问题.事实上,代码绝不能依赖于特定于实现的细节,而应该依赖于规范.

As the spec doesn't guarantee any traversal/iteration order for sets and maps, this is absolutely fine. In fact, code must never rely on implementation-specific details, but on the spec instead.

我知道今天,使用 JDK 8,如果我有一个 HashSet 并执行此操作(取自链接的答案):

I know that today, with JDK 8, if I have i.e. a HashSet and do this (taken from the linked answer):

Set<String> wordSet = new HashSet<>(Arrays.asList("just", "a", "test"));

System.out.println(wordSet);

for (int i = 0; i < 100; i++) {
    wordSet.add("" + i);
}

for (int i = 0; i < 100; i++) {
    wordSet.remove("" + i);
}

System.out.println(wordSet);

然后元素的迭代顺序将改变,两个输出将不同.这是因为在集合中添加和删除 100 个元素会改变 HashSet 的内部容量并重新散列元素.这是完全有效的行为.我不是在这里问这个.

Then the iteration order of the elements will change and the two outputs will differ. This is because adding and removing 100 elements to the set changes the internal capacity of the HashSet and rehashes elements. And this is perfectly valid behavior. I'm not asking about this here.

但是,对于 JDK9,如果我这样做:

However, with JDK9, if I do this:

Set<String> set = Set.of("just", "a", "test");
System.out.println(set);

然后,在另一个 JVM 实例中,我运行相同的代码,输出可能不同,因为引入了随机化.

And then, in another instance of the JVM, I run the same code, the outputs can be different, because randomization has been introduced.

到目前为止,我在 youtube 上找到了这个出色的视频(44:55 分钟),其中 Stuart Marks 说这种随机化的一个动机是:

So far, I've found this excellent video in youtube (minute 44:55), in which Stuart Marks says that one motivation for this randomization is:

(...) 人们编写的应用程序无意中依赖于迭代顺序.(...) 所以,无论如何,迭代顺序是一个大问题,我认为有很多代码对迭代顺序有潜在的依赖,但尚未发现.(...) 所以,我们对此的反应是故意随机化新集合中 SetMap 中的迭代顺序.因此,在集合的迭代顺序不可预测但稳定之前,这些是可预测的不可预测的.因此,每次 JVM 启动时,我们都会得到一个随机数,并将其用作与哈希值混合的种子值.所以,如果你运行一个程序,初始化一个集合,然后以任何顺序打印出元素,你会得到一个答案,然后,如果你再次调用 JVM 并运行同一个程序,元素集通常会出现在不同的顺序.所以,这里的想法是 (...) 如果您的代码中存在迭代顺序依赖关系,过去发生的事情是一个新的 JDK 版本发布并且您测试您的代码并且 (...) 它'd 需要花费数小时的调试时间才能将其追溯到迭代顺序的某种更改.这意味着该代码中存在一个依赖于迭代顺序的错误.现在,如果您更频繁地改变迭代顺序,就像每次 JVM 调用一样,那么(我们希望)这种奇怪的行为会更频繁地出现,实际上我们希望在您进行测试时...>

(...) that people write applications that have inadvertent dependencies on iteration order. (...) So, anyway, iteration order is a big deal and I think there's a lot of code out there that has latent dependencies on iteration order that has not been discovered yet. (...) So, our response to this is to deliberately randomize the iteration order in Set and Map in the new collections. So whereas before the iteration order of collections was unpredictable but stable, these are predictably unpredictable. So every time the JVM starts up, we get a random number and we use that at as a seed value that gets mixed in with the hash values. So, if you run a program that initializes a set and then prints out the elements in any order, you get an answer, and then, if you invoke the JVM again and run that same program, the set of elements usually would come out in a different order. So, the idea here is that (...) if there are iteration order dependencies in your code, what used to happen in the past, is a new JDK release came out and you test your code and (...) it'd take hours of debugging to trace it down to some kind of change in iteration order. What that meant was there was a bug in that code that depended on the iteration order. Now, if you vary the iteration order more often, like every JVM invocation, then (we hope) that weird behavior will manifest itself more frequently, and in fact we hope while you're doing testing...

所以,动机很明确,而且这种随机化只会影响新的不可变集合和映射.

So, the motivation is clear, and it's also clear that this randomization will only affect the new immutable sets and maps.

我的问题是:这种随机化还有其他动机吗?它有什么优势?

推荐答案

事实证明随机迭代顺序还有另一个原因.这不是什么大秘密或什么.我以为我在那次演讲中已经解释过了,但也许没有.我可能在 OpenJDK 邮件列表或内部讨论中提到过它.

Well it turns out there is another reason for the randomized iteration order. It's not a big secret or anything. I thought I had explained it in that talk, but maybe not. I probably mentioned it on the OpenJDK mailing lists or perhaps in internal discussions.

无论如何,随机迭代顺序的另一个原因是为未来的实现更改保留灵活性.

In any case, another reason for randomized iteration order is to preserve flexibility for future implementation changes.

事实证明,这比大多数人想象的要大得多.历史上,HashSetHashMap 从未指定特定的迭代顺序.但是,有时需要更改实现以提高性能或修复错误.对迭代顺序的任何更改都会引起用户的强烈不满.多年来,改变迭代顺序产生了很多阻力,这使得 HashMap 的维护变得更加困难.

This turns out to be a bigger deal than most people think. Historically, HashSet and HashMap have never specified a particular iteration order. From time to time, however, the implementation needed to change, to improve performance or to fix bugs. Any change to iteration order generated a lot of flak from users. Over the years, a lot of resistance built up to changing iteration order, and this made maintenance of HashMap more difficult.

要了解为什么会出现这个问题,请考虑一系列不同的策略来管理迭代顺序的稳定性:

To see why this is a problem, consider a spectrum of different policies for managing the stability of iteration order:

  1. 指定迭代顺序,并坚持下去.

  1. Specify the iteration order, and stick to it.

不指定迭代顺序,但隐含地保持迭代顺序稳定.

Leave iteration order unspecified, but implicitly keep iteration order stable.

不指定迭代顺序,但尽可能少地改变迭代顺序.

Leave iteration order unspecified, but change the iteration order as little as possible.

经常更改迭代顺序,例如在更新版本中.

Change the iteration order frequently, e.g., in update releases.

更频繁地更改迭代顺序,例如,从 JVM 的一次运行到下一次.

Change the iteration order more frequently, e.g., from one run of the JVM to the next.

更频繁地更改迭代顺序甚至,例如,从一次迭代到下一次.

Change the iteration order even more frequently, e.g., from one iteration to the next.

在 JDK 1.2 中引入集合时,未指定 HashMap 迭代顺序.LinkedHashMap 以更高的成本提供了稳定的迭代顺序.如果您不需要稳定的迭代顺序,则不必为此付费.这排除了 #1 和 #2.

When collections were introduced in JDK 1.2, HashMap iteration order was unspecified. Stable iteration order was provided by LinkedHashMap at a somewhat higher cost. If you didn't need a stable iteration order, you shouldn't have to pay for it. This ruled out #1 and #2.

在接下来的几个版本中,我们试图保持迭代顺序稳定,即使规范允许它发生变化.没有人喜欢代码损坏时的情况,而且不得不告诉客户他的代码损坏了,因为这取决于迭代顺序,这是非常不愉快的.

For the next several releases, we tried to keep iteration order stable, even though the specification allowed it to change. Nobody likes it when code breaks, and it's pretty unpleasant to have to tell a customer that his code is broken because it depends on iteration order.

所以我们最终采用了策略 #3,尽可能保持迭代顺序稳定,尽管它不时发生变化.例如,我们在 JDK 7u6 中引入了替代哈希(JDK-7118743 的代码审查)和 JDK 8 中的树箱(JEP 180),并且在某些情况下都更改了 HashMap 迭代顺序.在早期版本中,排序也发生了几次变化.有人做了一些考古,发现每个主要 JDK 版本的迭代顺序平均改变一次.

So we ended up with policy #3, keeping iteration order as stable as possible, although it did change from time to time. For example, we introduced alternative hashing in JDK 7u6 (code review for JDK-7118743) and tree bins in JDK 8 (JEP 180), and both changed HashMap iteration order in some circumstances. Ordering also changed a couple times in earlier releases. Somebody did some archaeology and found that iteration order changed an average of once per major JDK release.

这是所有可能世界中最糟糕的.主要版本每两年才发布一次.当一个出来时,每个人的代码都会被破坏.会有很多抱怨和咬牙切齿,人们会修复他们的代码,我们保证永远不会再改变迭代顺序.几年过去了,新代码会在无意中依赖于迭代顺序而编写.然后我们会推出另一个改变迭代顺序的主要版本,这将再次破坏每个人的代码.循环将重新开始.

This was the worst of all possible worlds. Major releases only occurred once every couple years. When one came out, everybody's code would break. There would be much wailing and gnashing of teeth, people would fix their code, and we'd promise to never, ever change iteration order again. A couple years would go by and new code would be written that inadvertently depended on iteration order. Then we'd come out with another major release that changed the iteration order, and this would break everybody's code again. And the cycle would begin anew.

我想避免为新系列重复这个循环.我没有尽可能保持迭代顺序稳定,而是尽可能频繁地更改它.最初,每次 迭代都会更改顺序,但这会带来一些开销.最终我们决定每次 JVM 调用一次.成本是每个表探针的 32 位 XOR 操作,我认为这非常便宜.

I wanted to avoid repeating this cycle for the new collections. Instead of keeping iteration order as stable as possible, I pursued a policy of changing it as frequently as possible. Initially the order changed on every iteration, but this imposed some overhead. Eventually we settled on once per JVM invocation. The cost is a 32-bit XOR operation per table probe, which I think is pretty cheap.

在某种程度上,这与强化"应用程序代码有关.如果更改迭代顺序会破坏代码,那么更频繁地破坏该代码将导致它对这种破坏产生抵抗力.当然,代码本身并不会变得更强大;这需要开发人员付出更多努力才能实现.人们会很合理地抱怨不得不做这项额外的工作.

To a certain extent this is about "toughening up" application code. If changing iteration order breaks code, then breaking that code more frequently will cause it to develop resistance that kind of breakage. Of course, the code doesn't get stronger by itself; it requires more developer effort for this to happen. And people will quite reasonably complain about having to do this additional work.

但是在某种意义上,应用程序代码的强化"是次要的,另一个目标是保留更改实现的自由.保留HashMap 的迭代顺序使得维护变得更加困难.新集合中的随机迭代顺序意味着我们在修改它们时无需担心保留迭代顺序,因此它们更易于维护和增强.

But the "toughening" of application code is in some sense secondary to the other goal of preserving the freedom to change the implementation. Preserving iteration order of HashMap has made it more difficult to maintain. Randomized iteration order in the new collections means that we needn't worry about preserving iteration order when modifying them, so they're easier to maintain and enhance.

例如,当前的实现(Java 9,pre-GA,2017 年 7 月)具有三个基于字段的 Set 实现(Set0Set1>Set2) 和基于数组的实现 (SetN),它使用带有线性探测方案的简单封闭散列.将来,我们可能想要添加一个 Set3 实现,它在三个字段中包含三个元素.或者,我们可能希望将 SetN 的冲突解决策略从线性探测更改为更复杂的策略.如果我们不必处理保留迭代顺序的问题,即使在次要版本中,我们也可以完全重构实现.

For example, the current implementation (Java 9, pre-GA, July 2017) has three field-based implementations of Set (Set0, Set1, and Set2) and an array-based implementation (SetN) that uses a simple closed hashing with linear probing scheme. In the future, we might want to add a Set3 implementation that holds three elements in three fields. Or, we might want to change the collision resolution policy of SetN from linear probing to something more sophisticated. We can completely restructure the implementation, even in minor releases, if we don't have to deal with preserving iteration order.

总而言之,权衡是应用程序开发人员必须做更多的工作来确保他们的代码能够抵抗迭代顺序更改带来的破坏.这很可能是他们在某些时候必须使用 HashMap 完成的工作.这样做的好处是 JDK 有更多机会提供更高的性能和空间效率,每个人都可以从中受益.

In summary, the tradeoff is that application developers have to do more work to make sure their code resists breakage from iteration order change. This is likely work they'd have to do at some point anyway with HashMap. What's gained by this is more opportunities for the JDK to deliver improved performance and space efficiency, from which everybody can benefit.

这篇关于不可变集和映射上的 JDK9 随机化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆