比较直接和非直接ByteBuffer get / put操作 [英] Compare Direct and Non-Direct ByteBuffer get/put operations

查看:306
本文介绍了比较直接和非直接ByteBuffer get / put操作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从非直接bytebuffer获取/放入比直接bytebuffer获取/放入更快吗?



如果我必须从直接bytebuffer读/写,是不是最好首先读/写一个线程本地字节数组,然后用字节数组更新(写入)直接bytebuffer?

解决方案


从非直接字节缓冲区中获取/放置比直接bytebuffer中的get / put更快吗?


如果要将堆缓冲区与不使用本机字节顺序的直接缓冲区进行比较(大多数系统都是小端,而直接ByteBuffer的默认值是big endian),则性能非常相似。



如果使用本机有序字节缓冲区,则对于多字节值,性能可能会明显提高。对于 byte ,无论你做什么都没什么区别。



在HotSpot / OpenJDK中,ByteBuffer使用Unsafe类许多原生方法被视为内在的。这是依赖于JVM的,AFAIK是Android VM在最近版本中将其视为内在函数。



如果转储生成的程序集,您可以看到不安全的内在函数被转为一个机器代码指令。即他们没有JNI呼叫的开销。



事实上,如果你进行微调,你可能会发现ByteBuffer getXxxx的大部分时间或setXxxx用于边界检查,而不是实际的内存访问。出于这个原因,当我必须以获得最大性能时,我仍然直接使用Unsafe(注意:Oracle不鼓励这样做)


如果我必须从直接bytebuffer读取/写入,首先读取/写入线程本地字节数组然后使用字节数组更新(写入)直接字节缓冲区更好吗?


我不愿意看到什么比这更好。 ;)听起来很复杂。



通常最简单的解决方案更好更快。






您可以使用此代码自行测试。

  public static void main(String ... args){
ByteBuffer bb1 = ByteBuffer.allocateDirect(256 * 1024).order(ByteOrder.nativeOrder());
ByteBuffer bb2 = ByteBuffer.allocateDirect(256 * 1024).order(ByteOrder.nativeOrder());
for(int i = 0; i< 10; i ++)
runTest(bb1,bb2);
}

private static void runTest(ByteBuffer bb1,ByteBuffer bb2){
bb1.clear();
bb2.clear();
long start = System.nanoTime();
int count = 0;
while(bb2.remaining()> 0)
bb2.putInt(bb1.getInt());
long time = System.nanoTime() - start;
int operations = bb1.capacity()/ 4 * 2;
System.out.printf(每个putInt / getInt平均占%。1f ns%n,(双倍)时间/操作);
}

打印

 每个putInt / getInt平均花费83.9 ns 
每个putInt / getInt平均花费1.4 ns
每个putInt / getInt平均花费34.7 ns
每个putInt / getInt平均花费1.3 ns
每个putInt / getInt平均花费1.2 ns
每个putInt / getInt平均花费1.3 ns
每个putInt / getInt平均花费1.2 ns
每个putInt / getInt平均花费1.2 ns
每个putInt / getInt平均花费1.2 ns
每个putInt / getInt平均花费1.2 ns

我非常确定JNI调用的时间超过1.2 ns。



< hr>

为了证明它不是JNI,而是围绕它的guff导致延迟。您可以直接使用Unsafe编写相同的循环。

  public static void main(String ... args){
ByteBuffer bb1 = ByteBuffer.allocateDirect(256 * 1024).order(ByteOrder.nativeOrder());
ByteBuffer bb2 = ByteBuffer.allocateDirect(256 * 1024).order(ByteOrder.nativeOrder());
for(int i = 0; i< 10; i ++)
runTest(bb1,bb2);
}

private static void runTest(ByteBuffer bb1,ByteBuffer bb2){
Unsafe unsafe = getTheUnsafe();
long start = System.nanoTime();
long addr1 =((DirectBuffer)bb1).address();
long addr2 =((DirectBuffer)bb2).address();
for(int i = 0,len = Math.min(bb1.capacity(),bb2.capacity()); i< len; i + = 4)
unsafe.putInt(addr1 +我,unsafe.getInt(addr2 + i));
long time = System.nanoTime() - start;
int operations = bb1.capacity()/ 4 * 2;
System.out.printf(每个putInt / getInt平均占%。1f ns%n,(双倍)时间/操作);
}

public static不安全getTheUnsafe(){
try {
Field theUnsafe = Unsafe.class.getDeclaredField(theUnsafe);
theUnsafe.setAccessible(true);
return(Unsafe)theUnsafe.get(null);
} catch(异常e){
抛出新的AssertionError(e);
}
}

打印


$ b每个putInt / getInt平均花费44.4 ns
0.4 ns
每个putInt / getInt平均花费0.3 ns
每个putInt / getInt平均花费0.3 ns
每个putInt / getInt平均花费0.3 ns
每个putInt / getInt平均花费0.3 ns
每个putInt / getInt平均花费0.3 ns
每个putInt / getInt平均花费0.3 ns
每个putInt / getInt平均花费0.3 ns

所以你可以看到原生来电比JNI调用的速度要快得多。这种延迟的主要原因可能是L2缓存速度。 ;)



全部在i3 3.3 GHz上运行


Is get/put from a non-direct bytebuffer faster than get/put from direct bytebuffer ?

If I have to read / write from direct bytebuffer , is it better to first read /write in to a thread local byte array and then update ( for writes ) the direct bytebuffer fully with the byte array ?

解决方案

Is get/put from a non-direct bytebuffer faster than get/put from direct bytebuffer ?

If you are comparing heap buffer with direct buffer which does not use native byte order (most systems are little endian and the default for direct ByteBuffer is big endian), the performance is very similar.

If you use native ordered byte buffers the performance can be significantly better for multi-byte values. For byte it makes little difference no matter what you do.

In HotSpot/OpenJDK, ByteBuffer uses the Unsafe class and many of the native methods are treated as intrinsics. This is JVM dependent and AFAIK the Android VM treats it as an intrinsic in recent versions.

If you dump the assembly generated you can see the intrinsics in Unsafe are turned in one machine code instruction. i.e. they don't have the overhead of a JNI call.

In fact if you are into micro-tuning you may find that most of the time of a ByteBuffer getXxxx or setXxxx is spend in the bounds checking, not the actual memory access. For this reason I still use Unsafe directly when I have to for maximum performance (Note: this is discouraged by Oracle)

If I have to read / write from direct bytebuffer , is it better to first read /write in to a thread local byte array and then update ( for writes ) the direct bytebuffer fully with the byte array ?

I would hate to see what that is better than. ;) It sounds very complicated.

Often the simplest solutions are better and faster.


You can test this yourself with this code.

public static void main(String... args) {
    ByteBuffer bb1 = ByteBuffer.allocateDirect(256 * 1024).order(ByteOrder.nativeOrder());
    ByteBuffer bb2 = ByteBuffer.allocateDirect(256 * 1024).order(ByteOrder.nativeOrder());
    for (int i = 0; i < 10; i++)
        runTest(bb1, bb2);
}

private static void runTest(ByteBuffer bb1, ByteBuffer bb2) {
    bb1.clear();
    bb2.clear();
    long start = System.nanoTime();
    int count = 0;
    while (bb2.remaining() > 0)
        bb2.putInt(bb1.getInt());
    long time = System.nanoTime() - start;
    int operations = bb1.capacity() / 4 * 2;
    System.out.printf("Each putInt/getInt took an average of %.1f ns%n", (double) time / operations);
}

prints

Each putInt/getInt took an average of 83.9 ns
Each putInt/getInt took an average of 1.4 ns
Each putInt/getInt took an average of 34.7 ns
Each putInt/getInt took an average of 1.3 ns
Each putInt/getInt took an average of 1.2 ns
Each putInt/getInt took an average of 1.3 ns
Each putInt/getInt took an average of 1.2 ns
Each putInt/getInt took an average of 1.2 ns
Each putInt/getInt took an average of 1.2 ns
Each putInt/getInt took an average of 1.2 ns

I am pretty sure a JNI call takes longer than 1.2 ns.


To demonstrate that its not the "JNI" call but the guff around it which causes the delay. You can write the same loop using Unsafe directly.

public static void main(String... args) {
    ByteBuffer bb1 = ByteBuffer.allocateDirect(256 * 1024).order(ByteOrder.nativeOrder());
    ByteBuffer bb2 = ByteBuffer.allocateDirect(256 * 1024).order(ByteOrder.nativeOrder());
    for (int i = 0; i < 10; i++)
        runTest(bb1, bb2);
}

private static void runTest(ByteBuffer bb1, ByteBuffer bb2) {
    Unsafe unsafe = getTheUnsafe();
    long start = System.nanoTime();
    long addr1 = ((DirectBuffer) bb1).address();
    long addr2 = ((DirectBuffer) bb2).address();
    for (int i = 0, len = Math.min(bb1.capacity(), bb2.capacity()); i < len; i += 4)
        unsafe.putInt(addr1 + i, unsafe.getInt(addr2 + i));
    long time = System.nanoTime() - start;
    int operations = bb1.capacity() / 4 * 2;
    System.out.printf("Each putInt/getInt took an average of %.1f ns%n", (double) time / operations);
}

public static Unsafe getTheUnsafe() {
    try {
        Field theUnsafe = Unsafe.class.getDeclaredField("theUnsafe");
        theUnsafe.setAccessible(true);
        return (Unsafe) theUnsafe.get(null);
    } catch (Exception e) {
        throw new AssertionError(e);
    }
}

prints

Each putInt/getInt took an average of 40.4 ns
Each putInt/getInt took an average of 44.4 ns
Each putInt/getInt took an average of 0.4 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns

So you can see that the native call is much faster than you might expect for a JNI call. The main reason for this delay could be the L2 cache speed. ;)

All run on an i3 3.3 GHz

这篇关于比较直接和非直接ByteBuffer get / put操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆