关于不可变集和映射的JDK9随机化 [英] JDK9 randomization on immutable sets and maps

查看:108
本文介绍了关于不可变集和映射的JDK9随机化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

阅读此问题 Eugene给出的答案,我发现JDK9不可变集和映射将引入一个会影响其遍历的随机源。这意味着迭代顺序确实是随机的,至少在JVM的不同运行中是这样。

Reading this question and the answer given by Eugene, I found that JDK9 immutable sets and maps will introduce a source of randomness that will affect their traversal. This means that iteration order will indeed be random, at least among different runs of the JVM.

由于规范不保证任何遍历集合和映射的迭代顺序,这绝对没问题。实际上,代码绝不能依赖于特定于实现的细节,而是依赖于规范。

As the spec doesn't guarantee any traversal/iteration order for sets and maps, this is absolutely fine. In fact, code must never rely on implementation-specific details, but on the spec instead.

我知道今天,使用JDK 8,如果我有一个 HashSet 并执行此操作(取自链接的答案):

I know that today, with JDK 8, if I have i.e. a HashSet and do this (taken from the linked answer):

Set<String> wordSet = new HashSet<>(Arrays.asList("just", "a", "test"));

System.out.println(wordSet);

for (int i = 0; i < 100; i++) {
    wordSet.add("" + i);
}

for (int i = 0; i < 100; i++) {
    wordSet.remove("" + i);
}

System.out.println(wordSet);

然后元素的迭代顺序将改变,两个输出将不同。这是因为向集合中添加和删除100个元素会更改 HashSet 的内部容量并重新生成元素。这是完全有效的行为。我不是在这里问这个。

Then the iteration order of the elements will change and the two outputs will differ. This is because adding and removing 100 elements to the set changes the internal capacity of the HashSet and rehashes elements. And this is perfectly valid behavior. I'm not asking about this here.

然而,对于JDK9,如果我这样做:

However, with JDK9, if I do this:

Set<String> set = Set.of("just", "a", "test");
System.out.println(set);

然后,在JVM的另一个实例中,我运行相同的代码,输出可能不同,因为已经引入了随机化。

And then, in another instance of the JVM, I run the same code, the outputs can be different, because randomization has been introduced.

到目前为止,我发现这非常好,其中Stuart Marks说这个随机化的一个动机是:

So far, I've found this excellent video in youtube (minute 44:55), in which Stuart Marks says that one motivation for this randomization is:


(...)人们编写的应用程序无意中依赖于迭代顺序。 (...)所以,无论如何,迭代顺序是一个大问题,我认为有很多代码存在潜在的依赖于迭代顺序尚未发现的代码。 (...)因此,我们对此的回应是故意将 Set Map 中的迭代顺序随机化新系列。因此,在收集的迭代顺序不可预测但稳定之前,这些是可预测的不可预测的。因此,每次JVM启动时,我们都会获得一个随机数,并将其作为种子值使用,并与哈希值混合使用。因此,如果你运行一个初始化一个集合然后以任何顺序打印出元素的程序,你会得到一个答案,然后,如果再次调用JVM并运行同一个程序,那么元素集通常会出现在不同的顺序。所以,这里的想法是(...)如果你的代码中存在迭代顺序依赖,过去曾经发生的事情,是一个新的JDK版本出来了,你测试你的代码和(...)它' d需要数小时的调试才能将其追溯到迭代顺序中的某种变化。这意味着该代码中存在一个依赖于迭代顺序的错误。现在,如果你更频繁地改变迭代次序,比如每次JVM调用,那么(我们希望)奇怪的行为会更频繁地表现出来,事实上我们希望你在做测试时......

(...) that people write applications that have inadvertent dependencies on iteration order. (...) So, anyway, iteration order is a big deal and I think there's a lot of code out there that has latent dependencies on iteration order that has not been discovered yet. (...) So, our response to this is to deliberately randomize the iteration order in Set and Map in the new collections. So whereas before the iteration order of collections was unpredictable but stable, these are predictably unpredictable. So every time the JVM starts up, we get a random number and we use that at as a seed value that gets mixed in with the hash values. So, if you run a program that initializes a set and then prints out the elements in any order, you get an answer, and then, if you invoke the JVM again and run that same program, the set of elements usually would come out in a different order. So, the idea here is that (...) if there are iteration order dependencies in your code, what used to happen in the past, is a new JDK release came out and you test your code and (...) it'd take hours of debugging to trace it down to some kind of change in iteration order. What that meant was there was a bug in that code that depended on the iteration order. Now, if you vary the iteration order more often, like every JVM invocation, then (we hope) that weird behavior will manifest itself more frequently, and in fact we hope while you're doing testing...

所以,动机很清楚,而且很明显这种随机化只会影响新的不可变集和映射。

So, the motivation is clear, and it's also clear that this randomization will only affect the new immutable sets and maps.

我的问题是:这种随机化还有其他动机吗?它有什么优势?

推荐答案

事实证明,随机迭代顺序还有另一个原因。这不是一个大秘密或任何东西。我以为我在那次谈话中已经解释过了,但也许没有。我可能在OpenJDK邮件列表上或者在内部讨论中提到过它。

Well it turns out there is another reason for the randomized iteration order. It's not a big secret or anything. I thought I had explained it in that talk, but maybe not. I probably mentioned it on the OpenJDK mailing lists or perhaps in internal discussions.

在任何情况下,随机迭代顺序的另一个原因是保持未来实施的灵活性变化。

In any case, another reason for randomized iteration order is to preserve flexibility for future implementation changes.

这比大多数人想象的要大。从历史上看, HashSet HashMap 从未指定过特定的迭代顺序。但是,有时需要更改,改进性能或修复错误。对迭代顺序的任何更改都会从用户中产生很多瑕疵。多年来,很多阻力都在改变迭代顺序,这使得 HashMap 的维护变得更加困难。

This turns out to be a bigger deal than most people think. Historically, HashSet and HashMap have never specified a particular iteration order. From time to time, however, the implementation needed to change, to improve performance or to fix bugs. Any change to iteration order generated a lot of flak from users. Over the years, a lot of resistance built up to changing iteration order, and this made maintenance of HashMap more difficult.

要了解这是一个问题的原因,请考虑一系列不同的策略来管理迭代顺序的稳定性:

To see why this is a problem, consider a spectrum of different policies for managing the stability of iteration order:


  1. 指定迭代顺序,并坚持下去。

  1. Specify the iteration order, and stick to it.

不指定迭代顺序,但隐式保持迭代顺序稳定。

Leave iteration order unspecified, but implicitly keep iteration order stable.

不指定迭代顺序,但尽可能少地更改迭代顺序。

Leave iteration order unspecified, but change the iteration order as little as possible.

经常更改迭代顺序,例如,在更新中发布。

Change the iteration order frequently, e.g., in update releases.

更频繁地更改迭代顺序,例如,从JVM的一次运行到下一次运行。

Change the iteration order more frequently, e.g., from one run of the JVM to the next.

经常更改迭代顺序甚至更多,例如,从一次迭代到下一次迭代。

Change the iteration order even more frequently, e.g., from one iteration to the next.

在JDK 1.2中引入集合时, HashMap 迭代顺序没有具体说明。稳定的迭代顺序由 LinkedHashMap 提供,成本稍高。如果您不需要稳定的迭代订单,则不必为此付费。这排除了#1和#2。

When collections were introduced in JDK 1.2, HashMap iteration order was unspecified. Stable iteration order was provided by LinkedHashMap at a somewhat higher cost. If you didn't need a stable iteration order, you shouldn't have to pay for it. This ruled out #1 and #2.

对于接下来的几个版本,我们试图保持迭代顺序稳定,即使规范允许它改变。代码中断时没有人喜欢它,并且告诉客户他的代码被破坏是非常不愉快的,因为它取决于迭代顺序。

For the next several releases, we tried to keep iteration order stable, even though the specification allowed it to change. Nobody likes it when code breaks, and it's pretty unpleasant to have to tell a customer that his code is broken because it depends on iteration order.

所以我们最终得到了政策#3,尽可能保持迭代顺序尽可能稳定,尽管它确实不时发生变化。例如,我们在JDK 7u6中引入了替代散列( JDK-7118743的代码审查和JDK 8中的树箱( JEP 180 ) ,并且在某些情况下都更改了 HashMap 迭代顺序。在早期版本中,订购也改变了几次。有人做了一些考古学,发现每个主要JDK版本的迭代顺序平均改变一次。

So we ended up with policy #3, keeping iteration order as stable as possible, although it did change from time to time. For example, we introduced alternative hashing in JDK 7u6 (code review for JDK-7118743) and tree bins in JDK 8 (JEP 180), and both changed HashMap iteration order in some circumstances. Ordering also changed a couple times in earlier releases. Somebody did some archaeology and found that iteration order changed an average of once per major JDK release.

这是所有可能世界中最糟糕的。主要版本每两年才发生一次。当一个人出来时,每个人的代码都会破裂。人们会修复他们的代码,并且我们承诺永远不会再次改变迭代顺序。几年过去了,编写的新代码无意中依赖于迭代顺序。然后我们将推出另一个改变迭代顺序的主要版本,这将再次破坏每个人的代码。并且循环将重新开始。

This was the worst of all possible worlds. Major releases only occurred once every couple years. When one came out, everybody's code would break. There would be much wailing and gnashing of teeth, people would fix their code, and we'd promise to never, ever change iteration order again. A couple years would go by and new code would be written that inadvertently depended on iteration order. Then we'd come out with another major release that changed the iteration order, and this would break everybody's code again. And the cycle would begin anew.

我想避免为新集合重复此循环。我没有尽可能保持迭代顺序稳定,而是采取了尽可能频繁地改变它的政策。最初在每次迭代时更改了顺序,但这会产生一些开销。最终,我们根据JVM调用确定了一次。每个表探测的成本是32位XOR操作,我认为这很便宜。

I wanted to avoid repeating this cycle for the new collections. Instead of keeping iteration order as stable as possible, I pursued a policy of changing it as frequently as possible. Initially the order changed on every iteration, but this imposed some overhead. Eventually we settled on once per JVM invocation. The cost is a 32-bit XOR operation per table probe, which I think is pretty cheap.

在某种程度上,这是关于强化应用程序代码。如果更改迭代顺序会破坏代码,那么更频繁地破坏该代码将导致它产生那种破坏的阻力。当然,代码本身并没有变得更强大;它需要更多的开发人员才能实现这一目标。人们会非常合理地抱怨不得不做这项额外的工作。

To a certain extent this is about "toughening up" application code. If changing iteration order breaks code, then breaking that code more frequently will cause it to develop resistance that kind of breakage. Of course, the code doesn't get stronger by itself; it requires more developer effort for this to happen. And people will quite reasonably complain about having to do this additional work.

但是,应用程序代码的强化在某种意义上是继承保留自由的另一个目标。改变实施。保留 HashMap 的迭代顺序使维护更加困难。新集合中的随机迭代顺序意味着我们无需担心在修改它们时保留迭代顺序,因此它们更易于维护和增强。

But the "toughening" of application code is in some sense secondary to the other goal of preserving the freedom to change the implementation. Preserving iteration order of HashMap has made it more difficult to maintain. Randomized iteration order in the new collections means that we needn't worry about preserving iteration order when modifying them, so they're easier to maintain and enhance.

例如,当前的实现(Java 9,pre-GA,2017年7月)有三个基于字段的Set实现( Set0 Set1 Set2 )和一个基于数组的实现( SetN ),它使用带线性探测的简单闭合散列方案。将来,我们可能希望添加一个 Set3 实现,该实现在三个字段中包含三个元素。或者,我们可能希望将 SetN 的冲突解决策略从线性探测更改为更复杂的策略。如果我们不必处理保留迭代顺序,我们可以完全重构实现,即使是在次要版本中。

For example, the current implementation (Java 9, pre-GA, July 2017) has three field-based implementations of Set (Set0, Set1, and Set2) and an array-based implementation (SetN) that uses a simple closed hashing with linear probing scheme. In the future, we might want to add a Set3 implementation that holds three elements in three fields. Or, we might want to change the collision resolution policy of SetN from linear probing to something more sophisticated. We can completely restructure the implementation, even in minor releases, if we don't have to deal with preserving iteration order.

总之,权衡应用程序开发人员做更多的工作,以确保他们的代码抵制迭代顺序更改的破坏。无论如何,使用 HashMap ,这可能是他们必须要做的工作。通过这种方式获得的是JDK提供更多机会和空间效率的机会,每个人都可以从中受益。

In summary, the tradeoff is that application developers have to do more work to make sure their code resists breakage from iteration order change. This is likely work they'd have to do at some point anyway with HashMap. What's gained by this is more opportunities for the JDK to deliver improved performance and space efficiency, from which everybody can benefit.

这篇关于关于不可变集和映射的JDK9随机化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆