为什么直接内存“数组"比普通的Java数组清除起来更慢? [英] Why direct memory 'array' is slower to clear than a usual Java array?

查看:124
本文介绍了为什么直接内存“数组"比普通的Java数组清除起来更慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经建立了一个JMH基准测试,以测量使用null,System.arraycopy来自空数组,置零DirectByteBuffer或置零unsafe内存块来尝试回答

I've set up a JMH benchmark to measure what would be faster Arrays.fill with null, System.arraycopy from a null array, zeroying a DirectByteBuffer or zeroying an unsafe memory block trying to answer this question Let's put aside that zeroying a directly allocated memory is a rare case, and discuss the results of my benchmark.

这是JMH基准代码段(可通过gist获得的完整代码),其中包括unsafe.setMemory @apangin在原始帖子中建议的情况,byteBuffer.put(byte[], offset, length)longBuffer.put(long[], offset, length)在@ jan-schaefer建议的情况下:

Here's the JMH benchmark snippet (full code available via a gist) including unsafe.setMemory case as suggested by @apangin in the original post, byteBuffer.put(byte[], offset, length) and longBuffer.put(long[], offset, length) as suggested by @jan-schaefer:

@Benchmark
@BenchmarkMode(Mode.SampleTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public void arrayFill() {
    Arrays.fill(objectHolderForFill, null);
}

@Benchmark
@BenchmarkMode(Mode.SampleTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public void arrayCopy() {
    System.arraycopy(nullsArray, 0, objectHolderForArrayCopy, 0, objectHolderForArrayCopy.length);
}

@Benchmark
@BenchmarkMode(Mode.SampleTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public void directByteBufferManualLoop() {
    while (referenceHolderByteBuffer.hasRemaining()) {
        referenceHolderByteBuffer.putLong(0);
    }
}

@Benchmark
@BenchmarkMode(Mode.SampleTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public void directByteBufferBatch() {
    referenceHolderByteBuffer.put(nullBytes, 0, nullBytes.length);
}

@Benchmark
@BenchmarkMode(Mode.SampleTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public void directLongBufferManualLoop() {
    while (referenceHolderLongBuffer.hasRemaining()) {
        referenceHolderLongBuffer.put(0L);
    }
}

@Benchmark
@BenchmarkMode(Mode.SampleTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public void directLongBufferBatch() {
    referenceHolderLongBuffer.put(nullLongs, 0, nullLongs.length);
}


@Benchmark
@BenchmarkMode(Mode.SampleTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public void unsafeArrayManualLoop() {
    long addr = referenceHolderUnsafe;
    long pos = 0;
    for (int i = 0; i < size; i++) {
        unsafe.putLong(addr + pos, 0L);
        pos += 1 << 3;
    }
}

@Benchmark
@BenchmarkMode(Mode.SampleTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public void unsafeArraySetMemory() {
    unsafe.setMemory(referenceHolderUnsafe, size*8, (byte) 0);
}

这就是我得到的(Java 1.8,JMH 1.13,Core i3-6100U 2.30 GHz,Win10):

Here's what I got (Java 1.8, JMH 1.13, Core i3-6100U 2.30 GHz, Win10):

100 elements
Benchmark                                       Mode      Cnt   Score   Error    Units
ArrayNullFillBench.arrayCopy                   sample  5234029  39,518 ± 0,991    ns/op
ArrayNullFillBench.directByteBufferBatch       sample  6271334  43,646 ± 1,523    ns/op
ArrayNullFillBench.directLongBufferBatch       sample  4615974  45,252 ± 2,352    ns/op
ArrayNullFillBench.arrayFill                   sample  4745406  76,997 ± 3,547    ns/op
ArrayNullFillBench.unsafeArrayManualLoop       sample  5980381  78,811 ± 2,870    ns/op
ArrayNullFillBench.unsafeArraySetMemory        sample  5985884  85,062 ± 2,096    ns/op
ArrayNullFillBench.directLongBufferManualLoop  sample  4697023  116,242 ± 2,579   ns/op WOW
ArrayNullFillBench.directByteBufferManualLoop  sample  7504629  208,440 ± 10,651  ns/op WOW

I skipped all the loop implementations (except arrayFill for scale) from further tests

1000 elements
Benchmark                                 Mode      Cnt    Score   Error    Units
ArrayNullFillBench.arrayCopy              sample  6780681  184,516 ± 14,036  ns/op
ArrayNullFillBench.directLongBufferBatch  sample  4018778  293,325 ± 4,074   ns/op
ArrayNullFillBench.directByteBufferBatch  sample  4063969  313,171 ± 4,861   ns/op
ArrayNullFillBench.arrayFill              sample  6862928  518,886 ± 6,372   ns/op

10000 elements
Benchmark                                 Mode      Cnt     Score   Error    Units
ArrayNullFillBench.arrayCopy              sample  2551851  2024,543 ± 12,533  ns/op
ArrayNullFillBench.directLongBufferBatch  sample  2958517  4469,210 ± 10,376  ns/op
ArrayNullFillBench.directByteBufferBatch  sample  2892258  4526,945 ± 33,443  ns/op
ArrayNullFillBench.arrayFill              sample  5689507  5028,592 ± 9,074   ns/op

能否请您澄清以下问题:

Could you please clarify the following questions:

1. Why `unsafeArraySetMemory` is a bit but slower than `unsafeArrayManualLoop`?
2. Why directByteBuffer is 2.5X-5X slower than others?

推荐答案

为什么unsafeArraySetMemory有点比unsafeArrayManualLoop慢一些?

Why unsafeArraySetMemory is a bit but slower than unsafeArrayManualLoop?

我的猜测是,对于设置多个多头的情况,它并没有得到很好的优化.它必须检查您是否有东西,不是8的倍数.

My guess is that it not as well optimised for setting exactly multiple longs. It has to check whether you have something, not quite a multiple of 8.

为什么directByteBuffer比其他的慢了一个数量级?

Why directByteBuffer is by an order of magnitude slower than others?

一个数量级将是10倍左右,慢了约2.5倍.它必须边界检查每次访问并更新字段而不是局部变量.

An order of magnitude would be around 10x, it is about 2.5x slower. It has to bounds check every access and update a field instead of a local variable.

注意:我发现JVM并不总是使用Unsafe循环展开代码.您可以尝试自己做,看看是否有帮助.

NOTE: I have found the JVM doesn't always loop unroll code with Unsafe. You might try doing that yourself to see if it helps.

注意:本机代码可以使用XMM 128位指令,并且正在越来越多地使用它,这就是为什么复制速度可能如此之快的原因. Java 10中可能会访问XMM指令.

NOTE: Native code can use XMM 128 bit instructions and is using this increasingly which is why the copy might be so fast. Access to XMM instruction may come in Java 10.

这篇关于为什么直接内存“数组"比普通的Java数组清除起来更慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆