为什么直接内存“数组"比普通的Java数组清除起来更慢? [英] Why direct memory 'array' is slower to clear than a usual Java array?
问题描述
我已经建立了一个JMH基准测试,以测量使用null,System.arraycopy
来自空数组,置零DirectByteBuffer或置零unsafe
内存块来尝试回答
I've set up a JMH benchmark to measure what would be faster Arrays.fill
with null, System.arraycopy
from a null array, zeroying a DirectByteBuffer or zeroying an unsafe
memory block trying to answer this question
Let's put aside that zeroying a directly allocated memory is a rare case, and discuss the results of my benchmark.
这是JMH基准代码段(可通过gist获得的完整代码),其中包括unsafe.setMemory
@apangin在原始帖子中建议的情况,byteBuffer.put(byte[], offset, length)
和longBuffer.put(long[], offset, length)
在@ jan-schaefer建议的情况下:
Here's the JMH benchmark snippet (full code available via a gist) including unsafe.setMemory
case as suggested by @apangin in the original post, byteBuffer.put(byte[], offset, length)
and longBuffer.put(long[], offset, length)
as suggested by @jan-schaefer:
@Benchmark
@BenchmarkMode(Mode.SampleTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public void arrayFill() {
Arrays.fill(objectHolderForFill, null);
}
@Benchmark
@BenchmarkMode(Mode.SampleTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public void arrayCopy() {
System.arraycopy(nullsArray, 0, objectHolderForArrayCopy, 0, objectHolderForArrayCopy.length);
}
@Benchmark
@BenchmarkMode(Mode.SampleTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public void directByteBufferManualLoop() {
while (referenceHolderByteBuffer.hasRemaining()) {
referenceHolderByteBuffer.putLong(0);
}
}
@Benchmark
@BenchmarkMode(Mode.SampleTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public void directByteBufferBatch() {
referenceHolderByteBuffer.put(nullBytes, 0, nullBytes.length);
}
@Benchmark
@BenchmarkMode(Mode.SampleTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public void directLongBufferManualLoop() {
while (referenceHolderLongBuffer.hasRemaining()) {
referenceHolderLongBuffer.put(0L);
}
}
@Benchmark
@BenchmarkMode(Mode.SampleTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public void directLongBufferBatch() {
referenceHolderLongBuffer.put(nullLongs, 0, nullLongs.length);
}
@Benchmark
@BenchmarkMode(Mode.SampleTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public void unsafeArrayManualLoop() {
long addr = referenceHolderUnsafe;
long pos = 0;
for (int i = 0; i < size; i++) {
unsafe.putLong(addr + pos, 0L);
pos += 1 << 3;
}
}
@Benchmark
@BenchmarkMode(Mode.SampleTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public void unsafeArraySetMemory() {
unsafe.setMemory(referenceHolderUnsafe, size*8, (byte) 0);
}
这就是我得到的(Java 1.8,JMH 1.13,Core i3-6100U 2.30 GHz,Win10):
Here's what I got (Java 1.8, JMH 1.13, Core i3-6100U 2.30 GHz, Win10):
100 elements
Benchmark Mode Cnt Score Error Units
ArrayNullFillBench.arrayCopy sample 5234029 39,518 ± 0,991 ns/op
ArrayNullFillBench.directByteBufferBatch sample 6271334 43,646 ± 1,523 ns/op
ArrayNullFillBench.directLongBufferBatch sample 4615974 45,252 ± 2,352 ns/op
ArrayNullFillBench.arrayFill sample 4745406 76,997 ± 3,547 ns/op
ArrayNullFillBench.unsafeArrayManualLoop sample 5980381 78,811 ± 2,870 ns/op
ArrayNullFillBench.unsafeArraySetMemory sample 5985884 85,062 ± 2,096 ns/op
ArrayNullFillBench.directLongBufferManualLoop sample 4697023 116,242 ± 2,579 ns/op WOW
ArrayNullFillBench.directByteBufferManualLoop sample 7504629 208,440 ± 10,651 ns/op WOW
I skipped all the loop implementations (except arrayFill for scale) from further tests
1000 elements
Benchmark Mode Cnt Score Error Units
ArrayNullFillBench.arrayCopy sample 6780681 184,516 ± 14,036 ns/op
ArrayNullFillBench.directLongBufferBatch sample 4018778 293,325 ± 4,074 ns/op
ArrayNullFillBench.directByteBufferBatch sample 4063969 313,171 ± 4,861 ns/op
ArrayNullFillBench.arrayFill sample 6862928 518,886 ± 6,372 ns/op
10000 elements
Benchmark Mode Cnt Score Error Units
ArrayNullFillBench.arrayCopy sample 2551851 2024,543 ± 12,533 ns/op
ArrayNullFillBench.directLongBufferBatch sample 2958517 4469,210 ± 10,376 ns/op
ArrayNullFillBench.directByteBufferBatch sample 2892258 4526,945 ± 33,443 ns/op
ArrayNullFillBench.arrayFill sample 5689507 5028,592 ± 9,074 ns/op
能否请您澄清以下问题:
Could you please clarify the following questions:
1. Why `unsafeArraySetMemory` is a bit but slower than `unsafeArrayManualLoop`?
2. Why directByteBuffer is 2.5X-5X slower than others?
推荐答案
为什么unsafeArraySetMemory有点比unsafeArrayManualLoop慢一些?
Why unsafeArraySetMemory is a bit but slower than unsafeArrayManualLoop?
我的猜测是,对于设置多个多头的情况,它并没有得到很好的优化.它必须检查您是否有东西,不是8的倍数.
My guess is that it not as well optimised for setting exactly multiple longs. It has to check whether you have something, not quite a multiple of 8.
为什么directByteBuffer比其他的慢了一个数量级?
Why directByteBuffer is by an order of magnitude slower than others?
一个数量级将是10倍左右,慢了约2.5倍.它必须边界检查每次访问并更新字段而不是局部变量.
An order of magnitude would be around 10x, it is about 2.5x slower. It has to bounds check every access and update a field instead of a local variable.
注意:我发现JVM并不总是使用Unsafe循环展开代码.您可以尝试自己做,看看是否有帮助.
NOTE: I have found the JVM doesn't always loop unroll code with Unsafe. You might try doing that yourself to see if it helps.
注意:本机代码可以使用XMM 128位指令,并且正在越来越多地使用它,这就是为什么复制速度可能如此之快的原因. Java 10中可能会访问XMM指令.
NOTE: Native code can use XMM 128 bit instructions and is using this increasingly which is why the copy might be so fast. Access to XMM instruction may come in Java 10.
这篇关于为什么直接内存“数组"比普通的Java数组清除起来更慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!