什么实现细节使这个代码很容易失败? [英] What implementation detail makes this code fail so easily?

查看:95
本文介绍了什么实现细节使这个代码很容易失败?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题不是关于众所周知的事实, HashMap 不是线程安全的,而是关于HotSpot和JDK代码上的特定故障模式。我对这个代码在NPE中失败的方式感到惊讶:

This question is not about the well-known and documented fact that HashMap is not thread-safe, but about its specific failure modes on HotSpot and JDK code. I am surprised by how readily this code fails with an NPE:

public static void main(String[] args) {
    Map<Integer, Integer> m = new HashMap<>(0, 0.75f);
    IntStream.range(0, 5).parallel().peek(i -> m.put(i, i)).map(m::get).count();
}

NPE的来源并不神秘:在 .map(m :: get)尝试取消装箱 null 。它在大约4次运行中失败。

There is no mystery as to where the NPE comes from: in the .map(m::get) step while trying to unbox a null. It fails in about 4 out of 5 runs.

在我的机器上运行时#availableProcessors()报告8,所以据推测,长度为5的范围被分成5个子任务,每个子任务只有一个成员。我还假设我的代码以解释模式运行。它可能会调用JIT编译的 HashMap Stream 方法,但会解释顶层,因此排除了任何变化其中 HashMap 状态被加载到线程本地内存(寄存器/堆栈)中,从而延迟了另一个线程对更新的观察。如果五个 put 操作中的某些操作在同一时间内不能在不同内核上执行,我不希望它会破坏 HashMap 的内部结构。在少量工作的情况下,个别任务的时间安排必须非常精确。

On my machine Runtime#availableProcessors() reports 8, so presumably the range of length 5 is split into 5 subtasks, each with just a single member. I also assume my code runs in interpreted mode. It might be calling into JIT-compiled HashMap or Stream methods, but the top level is interpreted, therefore precluding any variations where HashMap state is loaded into thread-local memory (registers/stack), thus delaying the observation of updates by another thread. If some of the five put operations don't execute literally during the same time on different cores, I don't expect it to destroy the HashMaps internal structure. The timing of individual tasks must be extremely precise given the little amount of work.

是否真的是精确的时间( commonPool 的线程必须取消停放),或者是否有其他路由到导致这在Oracle / OpenJDK HotSpot上失败?我当前的版本是

Is it really the precise timing (commonPool's threads must be unparked), or is there another route to cause this to fail on Oracle/OpenJDK HotSpot? My current version is

java version "1.8.0_72"
Java(TM) SE Runtime Environment (build 1.8.0_72-b15)
Java HotSpot(TM) 64-Bit Server VM (build 25.72-b15, mixed mode)

更新:我发现即使只进行两次插入也有类似的高失败率:

UPDATE: I find that even making just two insertions has a similarly high failure rate:

IntStream.range(0, 2).parallel().peek(i -> m.put(i, i)).map(m::get).count();


推荐答案

首先,它没有可靠地失败。我设法进行了一些没有发生异常的运行。然而,这并不意味着得到的地图是正确的。每个线程也有可能成功地放置自己的值,而结果映射错过了几个映射。

First, it’s not failing reliably. I managed to have some runs where no exception occurred. This, however doesn’t imply that the resulting map is correct. It’s also possible that each thread witnesses its own value being successfully put, while the resulting map misses several mappings.

但实际上,失败的是 NullPointerException 经常发生。我创建了以下调试代码来说明 HashMap 的工作:

But indeed, failing with a NullPointerException happens quite often. I created the following debug code to illustrate the HashMap’s working:

static <K,V> void debugPut(HashMap<K,V> m, K k, V v) {
    if(m.isEmpty()) debug(m);
    m.put(k, v);
    debug(m);
}
private static <K, V> void debug(HashMap<K, V> m) {
    for(Field f: FIELDS) try {
        System.out.println(f.getName()+": "+f.get(m));
    } catch(ReflectiveOperationException ex) {
        throw new AssertionError(ex);
    }
    System.out.println();
}
static final Field[] FIELDS;
static {
    String[] name={ "table", "size", "threshold" };
    Field[] f=new Field[name.length];
    for (int ix = 0; ix < name.length; ix++) try {
        f[ix]=HashMap.class.getDeclaredField(name[ix]);
    }
    catch (NoSuchFieldException ex) {
        throw new ExceptionInInitializerError(ex);
    }
    AccessibleObject.setAccessible(f, true);
    FIELDS=f;
}

使用简单顺序 for(int i = 0; i< 5; i ++)debugPut(m,i,i); 打印:

table: null
size: 0
threshold: 1

table: [Ljava.util.HashMap$Node;@70dea4e
size: 1
threshold: 1

table: [Ljava.util.HashMap$Node;@5c647e05
size: 2
threshold: 3

table: [Ljava.util.HashMap$Node;@5c647e05
size: 3
threshold: 3

table: [Ljava.util.HashMap$Node;@33909752
size: 4
threshold: 6

table: [Ljava.util.HashMap$Node;@33909752
size: 5
threshold: 6

正如您所看到的,由于 0 ,即使在顺序操作期间也会创建三个不同的后备阵列。每次容量增加,更有可能并发 put 错过数组更新并创建自己的数组。

As you can see, due to the initial capacity of 0, there are three different backing arrays created even during the sequential operation. Each time, the capacity is increased, there is a higher chance that a racy concurrent put misses the array update and creates its own array.

这与空映射的初始状态和几个尝试放置第一个键的线程特别相关,因为所有线程都可能遇到 null 表并创建自己的。此外,即使在读取已完成的第一个 put 的状态时,也会为第二个 put 创建一个新数组as好吧。

This is especially relevant for the initial state of an empty map and several threads trying to put their first key, as all threads might encounter the initial state of a null table and create their own. Also, even when reading the state of a completed first put, there is a new array created for the second put as well.

但是逐步调试显示出更多破裂机会:

But step-by-step debugging revealed even more chances of breaking:

在方法 putVal 内,我们在最后看到

++modCount;
if (++size > threshold)
    resize();
afterNodeInsertion(evict);
return null;

换句话说,成功插入新密钥之后,如果新大小超过阈值,则表格将调整大小。所以在第一个 put 上,在开头调用 resize(),因为该表是 null 并且由于您指定的初始容量为 0 ,即太低而不能存储一个映射,新容量将 1 和新的阈值 1 * loadFactor == 1 * 0.75f == 0.75f ,四舍五入到 0 。所以在第一个 put 的末尾,超出了新的阈值,另一个调整大小()触发操作。因此,当初始容量 0 时,第一个 put 已经创建并填充两个数组如果多个线程同时执行此操作,则会提供更高的中断机会,所有这些都会遇到初始状态。

In other words, after the successful insertion of a new key, the table will get resized, if the new size exceeds the threshold. So on the first put, resize() is called at the beginning because the table is null and since your specified initial capacity is 0, i.e. too low to store one mapping, the new capacity will be 1 and the new threshold will be 1 * loadFactor == 1 * 0.75f == 0.75f, rounded to 0. So right at the end of the first put, the new threshold is exceeded and another resize() operation triggered. So with an intial capacity of 0, the first put already creates and populates two arrays, which gives much higher chances to break, if multiple threads perform this action concurrently, all encountering the initial state.

还有另外一点。正在寻找进入 resize()操作我们看到

And there is another point. Looking into the resize() operation we see the lines:

 @SuppressWarnings({"rawtypes","unchecked"})
 Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
 table = newTab;
 if (oldTab != null) {
     … (transfer old contents to new array)

换句话说,在使用旧条目填充之前,新的数组引用存储在堆中,因此即使没有读取和写入的重新排序,也有可能是另一个线程读取该引用而不查看旧条目,包括它之前编写的条目。实际上,减少堆访问的优化可能会降低线程在紧接着的查询中看不到自己的更新的可能性。

In other words, the new array reference is stored into the heap before it has been populated with the old entries, so even without reordering of reads and writes, there is a chance that another thread reads that reference without seeing the old entries, including the one it has written itself previously. Actually, optimizations reducing the heap access may lower the chances of a thread not seeing its own update in an immediately following query.

但是,它还必须注意到一切都在这里解释,没有成立。由于JRE内部也使用了 HashMap ,即使在应用程序启动之前,使用 HashMap时也有可能遇到已编译的代码

Still, it must also noted that the assumption that everything runs interpreted here, is not founded. Since HashMap is used by the JRE internally as well, even before your application starts, there is also a chance of encountering already compiled code when using HashMap.

这篇关于什么实现细节使这个代码很容易失败?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆