解释JIT重新排序的工作方式 [英] Explain how JIT reordering works

查看:108
本文介绍了解释JIT重新排序的工作方式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经阅读了很多有关Java同步以及可能发生的所有问题的信息.但是,我仍然有些困惑的是JIT如何重新排序写入.

例如,简单的双重检查锁对我来说很有意义:

  class Foo {
    private volatile Helper helper = null; // 1
    public Helper getHelper() { // 2
        if (helper == null) { // 3
            synchronized(this) { // 4
                if (helper == null) // 5
                    helper = new Helper(); // 6
            }
        }
        return helper;
    }
}

我们在第1行使用volatile来实现事前发生的关系.没有它,JIT完全有可能整理我们的代码.例如:

  1. 线程1位于第6行,并且内存已分配给helper,但是构造函数尚未运行,因为JIT可以对我们的代码进行重新排序.

  2. 线程2进入第2行,并获取一个尚未完全创建的对象.

我理解这一点,但是我不完全理解JIT在重新排序方面的局限性.

例如,假设我有一个创建MyObject并将其放入HashMap<String, MyObject>的方法(我知道HashMap不是线程安全的,不应在多线程环境中使用,但是与我一起).线程1调用createNewObject:

public class MyObject {
    private Double value = null;

    public MyObject(Double value) {
        this.value = value;
    }
}

Map<String, MyObject> map = new HashMap<String, MyObject>();

public void createNewObject(String key, Double val){
    map.put(key, new MyObject( val ));
}

同时,线程2从Map调用了一个get方法.

public MyObject getObject(String key){
    return map.get(key);
}

线程2是否有可能从getObject(String key)接收未完全构造的对象?像这样:

  1. 线程1:为new MyObject( val )
  2. 分配内存
  3. 线程1:将对象放置在地图中
  4. 线程2:致电getObject(String key)
  5. 线程1:完成构建新的MyObject.

或者map.put(key, new MyObject( val ))直到完全构建对象后,才会将其放入地图中吗?

我想答案是,在对象完全构建之前,它不会将对象放到Map中(因为听起来很糟糕).那么,JIT如何重新排序?

简而言之,它只能在创建新的Object并将其分配给引用变量(例如经过双重检查的锁)时重新排序吗?一个完整的JIT概要对于一个SO答案可能很多,但是我真正好奇的是它如何重新排序一个写操作(如双重检查锁的第6行)以及阻止它将对象放入Map尚未完全构建.

解决方案

警告:文字墙

您的问题的答案在水平线之前.我将在答案的第二部分(与JIT无关,因此仅在您对JIT感兴趣的情况下就这样)中更深入地解释基本问题.问题第二部分的答案位于底部,因为它取决于我进一步描述的内容.

TL; DR在您通过编写线程不安全代码让它们有效的条件下,JIT可以执行任何所需的操作,JMM可以执行所需的任何操作.

注意:初始化"是指构造函数中发生的事情,它不包括其他任何东西,例如在构造之后调用静态init方法等.


如果重新排序产生的结果与合法执行相符,则不合法." ( JLS 17.4. 5-200 )

如果一组操作的结果符合JMM规定的有效执行链,则无论作者是否打算使用代码来产生结果,该结果都是允许的.

内存模型描述了程序的可能行为.只要程序的所有最终执行都产生可以由内存模型预测的结果,实现就可以自由生成其喜欢的任何代码.

这为实现者执行各种代码转换提供了很大的自由度,包括动作的重新排序和不必要的同步的删除"( SO答案,并对诸如 atomicity,内存可见性和排序,这三个都是线程安全程序的组成部分.

为了证明这一点,您的第一个代码示例(DCL模式)极不可能被JIT修改,从而产生一个尚未完全创建的对象".实际上,我相信不可能这样做,因为它不会遵循单线程程序的顺序或执行.

那么这里到底是什么问题?

问题是,如果未按同步顺序,先发生后发生的顺序等来对操作进行排序...(再次由

如果Helper是一个不可变的对象,使得Helper的所有字段都是最终对象,则经过双重检查的锁定将起作用,而不必使用volatile字段.该想法是对不可变对象(例如字符串或整数)的行为应与int或float大致相同;读取和写入对不可变对象的引用是原子的"().

使对象不可变可确保状态在构造函数退出时已完全初始化.

请记住,对象构造始终是不同步的.相对于构造该对象的线程,正在初始化的对象是唯一可见且安全的.为了让其他线程看到初始化,您必须安全地发布.这些是这些方法:

有几种简单的方法可以实现安全发布:

  1. 通过适当锁定的字段(JLS 17.4.5)交换参考
  2. 使用静态初始化程序进行初始化存储(JLS 12.4)
  3. 通过volatile字段(JLS 17.4.5)或通过此规则通过AtomicX类交换引用
  4. 将值初始化为最终字段(JLS 17.5)."

( Java中的安全发布和安全初始化)

安全发布确保完成后其他线程能够看到完全初始化的对象.

重申我们的想法,即仅保证线程按顺序排列才能保证看到副作用,所以您需要volatile的原因是,以便相对于线程2中的读取,对线程1中的帮助程序的写操作是有序的线程2不允许在读取后感知初始化,因为它发生在向辅助程序的写入之前.它背负易失性写操作,以便必须在初始化之后进行读取,然后再对易失性字段(传递属性)进行写操作.

总而言之,仅在创建对象之后才进行初始化,这仅是因为另一个线程按顺序进行了思考.由于JIT优化,构造后永远不会进行初始化.您可以通过在可变字段中确保适当的发布或使助手不可变来解决此问题.


现在,我已经描述了JMM中发布工作原理的一般概念,希望了解第二个示例将如何工作将很容易.

我想答案是,在对象完全构建之前,它不会将对象放到Map中(因为听起来很糟糕).那么JIT如何重新排序?

对于构造线程,它将在初始化后将其放入映射中.

对于读者线程,它可以看到任何想要的东西. (在HashMap中构造不正确的对象?这肯定在可能性范围之内.)

通过4个步骤描述的内容完全合法.在分配value或将其添加到映射之间没有顺序,因此线程2 可以感知乱序的初始化,因为MyObject是不安全发布的.

您实际上可以通过仅转换为ConcurrentHashMap来解决此问题,并且getObject()将完全是线程安全的,因为一旦将对象放入映射中,初始化将在put之前进行,而这两者都需要在put之前进行.由于ConcurrentHashMap是线程安全的,因此get.但是,一旦修改了对象,它将成为管理的噩梦,因为您需要确保更新状态是可见的且是原子的-如果一个线程检索到一个对象,而另一个线程在第一个线程可以完成修改和放置之前更新了该对象,该怎么办?放回地图上吗?

T1 -> get() MyObject=30 ------> +1 --------------> put(MyObject=31)
T2 -------> get() MyObject=30 -------> +1 -------> put(MyObject=31)

或者,您也可以使MyObject不可变,但是您仍然需要映射地图ConcurrentHashMap,以便其他线程看到put-线程缓存行为可能会缓存旧副本,而不刷新并保持重用旧版本. ConcurrentHashMap确保其写操作对读者可见,并确保线程安全.回顾线程安全的三个先决条件,我们从使用线程安全的数据结构获得可见性,通过使用不可变对象获得原子性,最后通过piggy带ConcurrentHashMap的线程安全性进行排序.

要总结出整个答案,我会说多线程是一个很难掌握的专业,我自己绝对不是.通过了解使程序具有线程安全性的概念,并考虑JMM允许和保证的内容,可以确保您的代码将执行您希望执行的操作.由于JMM允许在其参数范围内产生违反直觉的结果,而不是JIT进行性能优化,因此经常发生多线程代码中的错误.如果您阅读了所有内容,希望您会学到更多有关多线程的知识.线程安全应该通过构建线程安全范式来实现,而不是使用规范的不便之处(Lea或Bloch,甚至不确定谁说了这一点).

I have been reading a lot about synchronization in Java and all the problems that can occur. However, what I'm still slightly confused about is how the JIT can reorder a write.

For instance, a simple double check lock makes sense to me:

  class Foo {
    private volatile Helper helper = null; // 1
    public Helper getHelper() { // 2
        if (helper == null) { // 3
            synchronized(this) { // 4
                if (helper == null) // 5
                    helper = new Helper(); // 6
            }
        }
        return helper;
    }
}

We use volatile on line 1 to enforce a happens-before relationship. Without it, is entirely possible for the JIT to reoder our code. For example:

  1. Thread 1 is at line 6 and memory is allocated to helper however, the constructor has not yet run because the JIT could reordering our code.

  2. Thread 2 comes in at line 2 and gets an object that is not fully created yet.

I understand this, but I don't fully understand the limitations that the JIT has on reordering.

For instance, say I have a method that creates and puts a MyObject into a HashMap<String, MyObject> (I know that a HashMapis not thread safe and should not be used in a multi-thread environment, but bear with me). Thread 1 calls createNewObject:

public class MyObject {
    private Double value = null;

    public MyObject(Double value) {
        this.value = value;
    }
}

Map<String, MyObject> map = new HashMap<String, MyObject>();

public void createNewObject(String key, Double val){
    map.put(key, new MyObject( val ));
}

At the same time thread 2 calls a get from the Map.

public MyObject getObject(String key){
    return map.get(key);
}

Is it possible for thread 2 to receive an object from getObject(String key) that is not fully constructed? Something like:

  1. Thread 1: Allocate memory for new MyObject( val )
  2. Thread 1: Place object in map
  3. Thread 2: call getObject(String key)
  4. Thread 1: Finish constructing the new MyObject.

Or will map.put(key, new MyObject( val )) not put an object into the map until it's fully constructed?

I'd imagine that the answer is, it wouldn't put an object into the Map until it is fully constructed (because that sounds awful). So how can the JIT reorder?

In a nutshell can it only reorder when creating a new Object and assigning it to a reference variable, like the double checked lock? A complete rundown on the JIT might be much for a SO answer, but what I'm really curious about is how it can reorder a write (like line 6 on the double checked lock) and what stops it from putting an object into a Map that is not fully constructed.

解决方案

WARNING: WALL OF TEXT

The answer to your question is before the horizontal line. I will continue to explain deeper the fundamental problem in the second portion of my answer (which is not related to the JIT, so that's it if you are only interested in the JIT). The answer to the second part of your question lies at the bottom because it relies on what I describe further.

TL;DR The JIT will do whatever it wants, the JMM will do whatever it wants, being valid under the condition that you let them by writing thread unsafe code.

NOTE: "initialization" refers to what happens in the constructor, which excludes anything else such as calling a static init method after constructing etc...


"If the reordering produces results consistent with a legal execution, it is not illegal." (JLS 17.4.5-200)

If the result of a set of actions conforms to a valid chain of execution as per the JMM, then the result is allowed regardless of whether the author intended the code to produce that result or not.

"The memory model describes possible behaviors of a program. An implementation is free to produce any code it likes, as long as all resulting executions of a program produce a result that can be predicted by the memory model.

This provides a great deal of freedom for the implementor to perform a myriad of code transformations, including the reordering of actions and removal of unnecessary synchronization" (JLS 17.4).

The JIT will reorder whatever it sees fit unless we do not allow it using the JMM (in a multithreaded environment).

The details of what the JIT can or will do is nondeterministic. Looking at millions of samples of runs will not produce a meaningful pattern because reorderings are subjective, they depend on very specific details such as CPU arch, timings, heuristics, graph size, JVM vendor, bytecode size, etc... We only know that the JIT will assume that the code runs in a single threaded environment when it does not need to conform to the JMM. In the end, the JIT matters very little to your multithreaded code. If you want to dig deeper, see this SO answer and do a little research on such topics as IR Graphs, the JDK HotSpot source, and compiler articles such as this one. But again, remember that the JIT has very little to do with your multithreaded code transforms.


In practice, the "object that is not fully created yet" is not a side effect of the JIT but rather the memory model (JMM). In summary, the JMM is a specification that puts forth guarantees of what can and cannot be results of a certain set of actions, where actions are operations that involve a shared state. The JMM is more easily understood by higher level concepts such as atomicity, memory visibility, and ordering, those three of which are components of a thread-safe program.

To demonstrate this, it is highly unlikely for your first sample of code (the DCL pattern) to be modified by the JIT that would produce "an object that is not fully created yet." In fact, I believe that it is not possible to do this because it would not follow the order or execution of a single-threaded program.

So what exactly is the problem here?

The problem is that if the actions aren't ordered by a synchronization order, a happens-before order, etc... (described again by JLS 17.4-17.5) then threads are not guaranteed to see the side effects of performing such actions. Threads might not flush their caches to update the field, threads might observe the write out of order. Specific to this example, threads are allowed to see the object in an inconsistent state because it is not properly published. I'm sure that you have heard of safe publishing before if you have ever worked even the tiniest bit with multithreading.

You might ask, well if single-threaded execution cannot be modified by the JIT, why can the multithreaded version be?

Put simply, it's because the thread is allowed to think ("perceive" as usually written in textbooks) that the initialization is out of order due to the lack of proper synchronization.

"If Helper is an immutable object, such that all of the fields of Helper are final, then double-checked locking will work without having to use volatile fields. The idea is that a reference to an immutable object (such as a String or an Integer) should behave in much the same way as an int or float; reading and writing references to immutable objects are atomic" (The "Double-Checked Locking is Broken" Declaration).

Making the object immutable ensures that the state is fully initialized when the constructor exits.

Remember that object construction is always unsynchronized. An object that is being initialized is ONLY visible and safe with respect to the thread that constructed it. In order for other threads to see the initialization, you must publish it safely. Here are those ways:

"There are a few trivial ways to achieve safe publication:

  1. Exchange the reference through a properly locked field (JLS 17.4.5)
  2. Use static initializer to do the initializing stores (JLS 12.4)
  3. Exchange the reference via a volatile field (JLS 17.4.5), or as the consequence of this rule, via the AtomicX classes
  4. Initialize the value into a final field (JLS 17.5)."

(Safe Publication and Safe Initialization in Java)

Safe publication ensures that other threads will be able to see the fully initialized objects when after it finishes.

Revisiting our idea that threads are only guaranteed to see side effects if they are in order, the reason that you need volatile is so that your write to the helper in thread 1 is ordered with respect to the read in thread 2. Thread 2 is not allowed to perceive the initialization after the read because it occurs before the write to helper. It piggy backs on the volatile write such that the read must happen after the initialization AND THEN the write to the volatile field (transitive property).

To conclude, an initialization will only occur after the object is created only because another thread THINKS that is the order. An initialization will never occur after construction due to a JIT optimisation. You can fix this by ensuring proper publication through a volatile field or by making your helper immutable.


Now that I've described the general concepts behind how publication works in the JMM, hopefully understanding how your second example won't work will be easy.

I'd imagine that the answer is, it wouldn't put an object into the Map until it is fully constructed (because that sounds awful). So how can the JIT reorder?

To the constructing thread, it will put it into the map after initialization.

To the reader thread, it can see whatever the hell it wants. (improperly constructed object in HashMap? That is definitely within the realm of possibility).

What you described with your 4 steps is completely legal. There is no order between assigning value or adding it to the map, thus thread 2 can perceive the initialization out of order since MyObject was published unsafely.

You can actually fix this problem by just converting to ConcurrentHashMap and getObject() will be completely thread safe as once you put the object in the map, the initialization will occur before the put and both will need to occur before the get as a result of ConcurrentHashMap being thread safe. However, once you modify the object, it will become a management nightmare because you need to ensure that updating the state is visible and atomic - what if a thread retrieves an object and another thread updates the object before the first thread could finish modifying and putting it back in the map?

T1 -> get() MyObject=30 ------> +1 --------------> put(MyObject=31)
T2 -------> get() MyObject=30 -------> +1 -------> put(MyObject=31)

Alternatively you could also make MyObject immutable, but you still need to map the map ConcurrentHashMap in order for other threads to see the put - thread caching behavior might cache an old copy and not flush and keep reusing the old version. ConcurrentHashMap ensures that its writes are visible to readers and ensures thread-safety. Recalling our 3 prerequisites for thread-safety, we get visibility from using a thread-safe data structure, atomicity by using an immutable object, and finally ordering by piggybacking on ConcurrentHashMap's thread safety.

To wrap up this entire answer, I will say that multithreading is a very difficult profession to master, one that I myself most definitely have not. By understanding concepts of what makes a program thread-safe and thinking about what the JMM allows and guarantees, you can ensure that your code will do what you want it to do. Bugs in multithreaded code occur often as a result of the JMM allowing a counterintuitive result that is within its parameters, not the JIT doing performance optimisations. Hopefully you will have learned something a little bit more about multithreading if you read everything. Thread safety should be achieved by building a repertoire of thread-safe paradigms rather than using little inconveniences of the spec (Lea or Bloch, not even sure who said this).

这篇关于解释JIT重新排序的工作方式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆