x86上Java的最小侵入式编译障碍 [英] Least intrusive compile barrier for Java on x86

查看：75 发布时间：2020/5/8 19:49:58 java performance memory x86 barrier

本文介绍了x86上Java的最小侵入式编译障碍的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

如果我有一个Java进程通过共享的ByteBuffer或类似的东西与其他进程交互，那么C/C ++中与编译器屏障最不相干的等效项是什么?不需要可移植性-我对x86特别感兴趣.

If I hava a Java process interacting with some other process via a shared ByteBuffer or similar, what would be the least intrusive equivalent of a compiler barrier in C/C++? No portability is required - I am specifically interested in x86.

例如，根据伪代码，我有2个进程读取和写入内存区域:

For example I have 2 processes reading and writing to an area of memory as per the pseudocode:

p1:
    i = 0
    while true:
      A = 0
      //Write to B
      A = ++i

p2:
    a1 = A
    //Read from B
    a2 = A

    if a1 == a2 and a1 != 0:
      //Read was valid

由于在x86上执行严格的内存排序(加载到未重新排序的单独位置，而读取到未重新排序的单独位置)，这在C ++中不需要任何内存屏障，只需在每次写入之间以及每次读取之间(例如asm)进行编译即可易挥发的).

Due to the strict memory ordering on x86 (loads to separate locations not reorder and reads to separate locations not reordered) this does not require any memory barrier in C++, just a compile barrier between each write and between each read (i.e. asm volatile).

如何以最便宜的方式在Java中实现相同的排序约束.有什么比写volatile更容易打扰的?

How can I achieve the same ordering constraint in Java in the least expensive manner. Is there anything less intrusive than writing to a volatile?

推荐答案

sun.misc.Unsafe.putOrdered应该执行您想要的操作-volatile在x86上暗含锁的存储.我相信编译器不会围绕它移动指令.

sun.misc.Unsafe.putOrdered should do what you want - a store with the lock implied on x86 by volatile. The compiler will not move instructions around it, I believe.

这与AtomicInteger和好友上的lazySet相同，但是不能直接与ByteBuffer一起使用.

This is the same as lazySet on AtomicInteger and friends, but that can't be used directly with ByteBuffer.

与volatile或AtomicThings类不同，该方法适用于您对其使用的特定写操作，而不适用于成员的定义，因此使用它并不意味着要进行读取.

Unlike volatile or the AtomicThings classes, that method applies to the specific writes you use it on, and not the definition of the member, so using it doesn't imply anything for reads.

您似乎正在尝试实施 seqlock 之类的东西-这意味着您需要避免在版本计数器A的读取与数据本身的读取/写入之间进行重新排序.一个普通的int不会削减它-因为JIT可能会做各种顽皮的事情.我的建议是为您的计数器使用volatile int，然后使用putOrdered将其写入其中.这样，您就不必为易失性写付出代价(通常是十几个周期或更多)，同时让易失性读隐含了编译器障碍(而这些读操作的硬件障碍是无人操作的，因此可以使其快速运行) ).

It looks like you are trying to implement something like a seqlock - meaning you need to avoid re-ordering between reads of the version counter, A, and the reads/writes of the data itself. A plain int isn't going to cut it - since the JIT might do all sorts of naughty things. My recommendation would be to use a volatile int for your counter, but then write it to it with putOrdered. This way, you don't pay the price for volatile writes (a dozen cycles or more, usually), while getting the compiler barrier implied by the volatile read (and the hardware barrier for those reads is a no-op, making them fast).

所有这些，我认为您在这里是一个灰色区域，因为lazySet不是正式内存模型的一部分，并且不完全适合发生在推理之前的事情，因此您需要更深入的了解了解实际的JIT和硬件实现，以查看是否可以通过这种方式组合事物.

All that said, I think you are in a grey area here, because lazySet isn't a part of the formal memory model, and doesn't fit cleanly into the happens-before reasoning, so you need a deeper understanding of the actual JIT and hardware implementation to see if you can combine things in this way.

最后，即使使用易失性读写(忽略lazySet)，从Java内存模型的角度来看，我也不认为您的seqlock是正确的，因为易失性写入仅在该两次写入之间设置了一个条件-before.稍后在另一个线程上进行读取，以及在写入线程中进行较早的操作，但不在读取与在写入线程上进行写入之后的操作之间进行.换句话说，它是单向栅栏，而不是双向栅栏.我相信，即使读取两次== N，读取线程仍可以看到版本N + 1向共享区域中的写入.

Finally, even with volatile reads and writes (ignoring lazySet), I don't think your seqlock is sound from point of view of the java memory model, because volatile writes only set up a happens-before between that write and later reads on another thread, and earlier actions in the writing thread, but not between the read and actions following the write on the writing thread. Said another way, it is a unidirectional fence, not a bidirectional one. I believe writes in version N+1 to your shared region can be seen by the reading thread even while it reads A == N twice.

注释中的说明:

易失性只会设置一个单向障碍.它与WinTel在某些API中使用的获取/发布语义非常相似.例如，假设A，Bv和C最初都为零:

Volatile only sets up a one way barrier. It is very similar to acquire/release semantics used by WinTel in some APIs. For example, assume A, Bv, and C all initially zero:

Thread 1:
A = 1;   // 1
Bv = 1;  // 2
C = 1;   // 3

Thread 2:

int c = C;  // 4
int b = Bv; // 5
int a = A;  // 6

在这里，只有Bv易失.这两个线程在概念上与您的seqlock编写者和阅读者都做类似的事情-线程1以一种顺序写入某些内容，线程2以相反的顺序读取相同的内容，并尝试以此顺序进行推理.

Here, only Bv is volatile. The two threads are doing something similar in concept to your seqlock writers and readers - thread 1 writes some stuff in one order, and thread 2 reads the same stuff in a reverse order, and tries to reason about ordering from that.

如果线程2的b == 1，那么a == 1总是，因为1发生在2(程序顺序)之前，5发生在6(程序顺序)之前，最关键的是2发生在5读取之后的5之前该值是在2处写入的.因此，以这种方式对Bv进行写入和读取的行为就像篱笆一样.上面(2)的内容不能移动到下方"(2)，下面(5)的内容不能在上方移动"5.请注意，我仅限制每个线程直接在一个线程中移动，但是不能同时限制两个线程，这使我们进入下一个线程例如:

If thread two has b == 1, then a == 1 always, because 1 happens-before 2 (program order), and 5 happens before 6 (program order), and most critically 2 happens before 5 since 5 read the value written at 2. So in this way the write and read of Bv is acting like a fence. Things above (2) cannot "move below" (2), and things below (5) cannot "move above" 5. Note I only restricted movement in one directly for each thread, however, not both, which brings us to our next example:

与上述等效，您可能会假设如果a == 0，那么c == 0也是如此，因为C在a之后写入，而在之前读取.但是，挥发物不能保证这一点.特别是，上述发生之前的推理并不能阻止(3)像线程2那样移到(2)之上，也不能阻止(4)推到(5)之下.

Equivalently to the above, you might assume that if a == 0, then c == 0 also, since C is written after a, and read before. However, volatiles don't guarantee this. In particular, the happens-before reasoning above doesn't prevent (3) from being moved above (2) as observed by thread 2, nor do they prevent (4) from being pushed below (5).

更新:

让我们具体看一下您的示例.

Let's look at your example specifically.

我认为可能会发生这种情况，展开在p1中发生的写循环.

What I believe can happen is this, unrolling the write loop which occurs in p1.

p1:

i = 0
A = 0
// (p1-1) write data1 to B
A = ++i;  // (p1-2) 1 assigned to A

A=0  // (p1-3)
// (p1-4) write data2 to B
A = ++i;  // (p1-5) 2 assigned to A

p2:

a1 = A // (p2-1)
//Read from B // (p2-2)
a2 = A // (p2-3)

if a1 == a2 and a1 != 0:

比方说，对于a1和a2，p2看到1.这意味着在p2-1和p1-2之间(并通过扩展名p1-1)之间以及在p2-3和p1-2之间都有发生.但是，在p2和p1-4中的任何内容之间都发生了事.因此，实际上，我相信在p2-2处读取B可以观察到在p1-4处进行的第二次读取(可能已部分完成)，这可以移至"在p1-2和p1-3处的易失性写入上方.

Let's say p2 sees 1 for a1 and a2. This means there is a happens before between p2-1 and p1-2 (and by extension p1-1), and also between p2-3 and p1-2. However there is happens-before between anything in p2 and p1-4. So in fact, I believe the read of B at p2-2 can observe the second (perhaps partially completed) read at p1-4, which can "move above" the volatile writes at p1-2 and p1-3.

有趣的是，我认为您可能仅在这方面会提出一个新问题-忘记更快的壁垒-即使存在波动，这是否还能奏效?

It's interesting enough that I think you might make a new question just on that alone - forget about faster barriers - does this work at all even with volatile?

这篇关于x86上Java的最小侵入式编译障碍的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

x86上Java的最小侵入式编译障碍 [英] Least intrusive compile barrier for Java on x86

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

x86上Java的最小侵入式编译障碍 [英] Least intrusive compile barrier for Java on x86

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭