为什么在我的示例中Unsafe.fullFence()无法确保可见性? [英] Why does Unsafe.fullFence() not ensuring visibility in my example?

查看:122
本文介绍了为什么在我的示例中Unsafe.fullFence()无法确保可见性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试深入探讨Java中的 volatile 关键字并设置2个测试环境.我相信他们俩都使用x86_64并使用热点.

I am trying to dive deep into volatile keyword in Java and setup 2 testing environments. I believe both of them are with x86_64 and use hotspot.

Java version: 1.8.0_232
CPU: AMD Ryzen 7 8Core

Java version: 1.8.0_231
CPU: Intel I7

代码在这里:

import java.lang.reflect.Field;
import sun.misc.Unsafe;

public class Test {

  private boolean flag = true; //left non-volatile intentionally
  private volatile int dummyVolatile = 1;

  public static void main(String[] args) throws Exception {
    Test t = new Test();
    Field f = Unsafe.class.getDeclaredField("theUnsafe");
    f.setAccessible(true);
    Unsafe unsafe = (Unsafe) f.get(null);

    Thread t1 = new Thread(() -> {
        while (t.flag) {
          //int b = t.someValue;
          //unsafe.loadFence();
          //unsafe.storeFence();
          //unsafe.fullFence();
        }
        System.out.println("Finished!");
      });

    Thread t2 = new Thread(() -> {
        t.flag = false;
        unsafe.fullFence();
      });

    t1.start();
    Thread.sleep(1000);
    t2.start();
    t1.join();
  }
}

完成!"永远不会打印出来,这对我来说毫无意义.我期望线程2中的 fullFence 使 flag = false 全局可见.

根据我的研究,Hotspot使用 lock/mfence 在x86上实现 fullFence .并且根据英特尔公司针对mfence的指令集参考手册

"Finished!" is never printed which does not make sense to me. I am expecting the fullFence in thread 2 makes the flag = false globally visible.

From my research, Hotspot uses lock/mfence to implement fullFence on x86. And according to Intel's instruction-set reference manual entry for mfence

此序列化操作可确保以程序顺序在MFENCE指令之前的每个装入和存储指令在MFENCE指令之后的任何装入或存储指令之前都是全局可见的.

This serializing operation guarantees that every load and store instruction that precedes the MFENCE instruction in program order becomes globally visible before any load or store instruction that follows the MFENCE instruction.

即使是更糟糕",如果我在线程2中注释掉 fullFence ,并在线程1中取消注释任何 xxxFence ,代码也会打印出完成!".这变得毫无意义,因为至少.

也许我的信息来源不准确,或者我误会了一些东西.请帮忙,谢谢!

Even "worse", if I comment out fullFence in thread 2 and un-comment any one of the xxxFence in thread 1, the code prints out "Finished!" This makes even less sense, because at least lfence is "useless"/no-op in x86.

Maybe my source of information contains inaccuracy or i am misunderstanding something. Please help, thanks!

推荐答案

无关紧要的是运行时效果,而是强制编译器重新加载内容的编译时效果.

您的 t1 循环不包含 volatile 读取或任何可能与另一个线程同步的内容,因此无法保证它会永远存在.em>注意到对任何变量的任何更改.即,当JIT进入asm时,编译器可以创建一个循环,将值一次加载到寄存器中,而不是每次都从内存中重新加载.您一直希望编译器能够对非共享数据进行这种优化,这就是为什么该语言具有在没有可能进行同步的情况下可以执行此操作的规则的原因.

Your t1 loop contains no volatile reads or anything else that could synchronize-with another thread, so there's no guarantee it will ever notice any changes to any variables. i.e. when JITing into asm, the compiler can make a loop that loads the value into a register once, instead of reloading it from memory every time. This is the kind of optimization you always want the compiler to be able to do for non-shared data, which is why the language has rules that let it do this when there's no possible synchronization.

然后当然可以将条件从循环中吊起.因此,没有任何障碍或任何障碍,您的阅读器循环可以将JIT插入实现此逻辑的asm :

And then of course the condition can get hoisted out of the loop. So with no barriers or anything, your reader loop can JIT into asm that implements this logic:

if(t.flag) {
   for(;;){}  // infinite loop
}

除了排序之外,Java volatile 的另一部分是假定其他线程可以异步更改它,因此不能假定多次读取都给出相同的值.

Besides ordering, the other part of Java volatile is the assumption that other threads may change it asynchronously, so multiple reads can't be assumed to give the same value.

但是 unsafe.loadFence(); 会使JVM每次迭代都从(缓存一致性)内存中重新加载 t.flag .我不知道这是Java规范所必需的,还是仅仅是使它起作用的实现细节.

But unsafe.loadFence(); makes the JVM reload t.flag from (cache-coherent) memory every iteration. I don't know if this is required by the Java spec or merely an implementation detail that makes it happen to work.

如果这是带有非 atomic 变量的C ++(在C ++中这是未定义的行为),那么您会在GCC之类的编译器中看到完全相同的效果. _mm_lfence 还将是编译时的全屏障,并发出无用的 lfence 指令,有效地告诉编译器所有内存可能已更改,因此需要重新加载.因此,它无法对负载进行重新排序,也无法将其吊起.

If this was C++ with a non-atomic variable (which would be undefined behaviour in C++), you'd see exactly the same effect in a compiler like GCC. _mm_lfence would also be a compile-time full-barrier as well as emitting a useless lfence instruction, effectively telling the compiler that all memory might have changed and thus needs to be reloaded. So it can't reorder loads across it, or hoist them out of loops.

顺便说一句,我不确定 unsafe.loadFence()甚至JIT到x86上的 lfence 指令.它 对内存排序没有用(除了非常模糊的东西,例如从WC内存中隔离NT负载,例如从视频RAM复制,而JVM可以认为这是没有发生的),因此针对x86的JVM JITing可以将其视为编译时的障碍.就像C ++编译器为 std :: atomic_thread_fence(std :: memory_order_acquire); 所做的一样-阻止编译程序跨屏障重新加载,但不会发出asm指令,因为运行该主机的主机的asm内存JVM已经足够强大.

BTW, I wouldn't be so sure that unsafe.loadFence() even JITs to an lfence instruction on x86. It is useless for memory ordering (except for very obscure stuff like fencing NT loads from WC memory, e.g. copying from video RAM, which the JVM can assume isn't happening), so a JVM JITing for x86 could just treat it as a compile-time barrier. Just like what C++ compilers do for std::atomic_thread_fence(std::memory_order_acquire); - block compile time reordering of loads across the barrier, but emit no asm instructions because the asm memory of the host running the JVM is already strong enough.

在线程2中,我认为 unsafe.fullFence(); 没用.它只是使那个线程等待,直到较早的存储在全局范围内可见,然后再进行任何较晚的加载/存储. t.flag = false; 是一个无法优化的可见副作用,因此即使没有障碍,它肯定会在JITed组件中发生,即使它没有障碍.而且它不能被延迟或与其他东西合并,因为在同一线程中没有其他东西.

In thread 2, unsafe.fullFence(); is I think useless. It just makes that thread wait until earlier stores become globally visible, before any later loads/stores can happen. t.flag = false; is a visible side effect that can't be optimized away so it definitely happens in the JITed asm whether there's a barrier following it or not, even though it's not volatile. And it can't be delayed or merged with something else because there's nothing else in the same thread.

Asm存储区始终对其他线程可见,唯一的问题是当前线程在此线程中执行更多操作(尤其是加载)之前是否等待其存储缓冲区耗尽.即防止所有重新排序,包括StoreLoad.Java volatile 可以做到这一点,就像C ++ memory_order_seq_cst (通过在每个存储区之后使用完整的屏障)一样,但是没有屏障,它仍然像C ++ memory_order_relaxed .(或者在JITing x86 asm上,加载/存储实际上与获取/发布一样强大.)

Asm stores always become visible to other threads, the only question is whether the current thread waits for its store buffer to drain or not before doing more stuff (especially loads) in this thread. i.e. prevent all reordering, including StoreLoad. Java volatile does that, like C++ memory_order_seq_cst (by using a full barrier after every store), but without a barrier it's still a store like C++ memory_order_relaxed. (Or when JITing x86 asm, loads/stores are actually as strong as acquire/release.)

缓存是一致的,并且存储缓冲区总是尽可能快地耗尽自身(承诺到L1d缓存),以腾出空间来执行更多存储.

Caches are coherent, and the store buffer always drains itself (committing to L1d cache) as fast as it can to make room for more stores to execute.

注意:我不了解Java,也不知道在一个线程中分配一个非 并在另一个线程中读取它是多么不安全/未定义没有同步.根据您所看到的行为,这听起来与在C ++中使用非 atomic 变量(启用优化,就像HotSpot总是会这样做)一样,在同一件事上所看到的完全一样.

Caveat: I don't know a lot of Java, and I don't know exactly how unsafe / undefined it is to assign a non-volatile in one thread and read it in another with no synchronization. Based on the behaviour you're seeing, it sounds exactly like what you'd see in C++ for the same thing with non-atomic variables (with optimization enabled, like HotSpot always does)

(基于@Margaret的评论,我对我假设Java同步的工作方式进行了一些猜测.如果我误报了任何内容,请进行编辑或评论.)

(Based on @Margaret's comment, I updated with some guesswork about how I assume Java synchronization works. If I mis-stated anything, please edit or comment.)

在C ++上非原子原子var上的数据竞争始终是未定义的行为,但是,当然,当编译真实的ISA(不进行硬件竞争的预防)时,结果有时就是人们想要的

In C++ data races on non-atomic vars are always Undefined Behaviour, but of course when compiling for real ISAs (which don't do hardware race-prevention) the results are sometimes what people wanted.

这篇关于为什么在我的示例中Unsafe.fullFence()无法确保可见性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆