何时使用数组,缓冲或直接缓冲区 [英] When to use Array, Buffer or direct Buffer

查看:154
本文介绍了何时使用数组,缓冲或直接缓冲区的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题

在写与OpenGL库使用Matrix类,我碰到的是使用Java数组或存储数据的缓冲策略问题就来了(JOGL提供直接缓冲副本矩阵运算)。要分析这个问题,我写了关于阵列VS缓冲回路和批量操作的相对速度比较VS直接缓冲区小的性能测试程序。

我想在这里跟大家分享我的结果(因为我觉得他们相当有趣)。请随时发表评论和/或点出任何错误。结果
在code可以在 pastebin.com/is7UaiMV 进行查看。

注释


  • 循环读取数组实现为的 A [i] = B [I] 的作为,否则JIT优化器将完全删除code。实际的 VAR = A [I] 的似乎是pretty大同小异。


  • 在数组大小抽样结果10000这是很可能是JIT优化取代了像执行情况,System.arraycopy的环形阵列的访问。


  • 有没有批量获取buffer->缓冲区的Java工具的 A.get(B)的作为的 B.put(A)的,因此结果将是相同的批量把结果


结论

在几乎所有情况下,强烈建议使用Java内部数组。不仅是PUT / GET速度更快的大量的JIT是也能够在最后code进行更好的优化。

缓冲器应的如果可以使用的两个适用:


  • 您需要处理的数据的大量

  • 这是数据大部分或总是批量处理

请注意,一个backened缓冲器具有一个Java阵列backening缓冲器的内容。建议做,而不是循环穿上这回缓冲操作/获取。

直接缓冲区应的来,如果你担心的内存使用使用,从来没有访问底层数据。它们比非直接缓冲液,如果基础数据被访问慢得多略慢,但使用较少的内存。此外,还有转换时,非字节的数据(如浮阵列)转换为字节使用直接缓冲区时是一个额外的开销。

有关更多详情请看这里:

样品结果

注:百分比只是为了便于阅读,并没有实际意义

使用大小为16的阵列,10,000,000迭代...

   - 阵列测试:--------------------------------- --------循环写入阵列:87.29毫秒11,52%
Arrays.fill:64.51毫秒8,51%
循环读取数组:42.11毫秒5,56%
System.arraycopy:47.25毫秒6,23% - 缓冲区测试:----------------------------------------环路把缓冲区:603.71毫秒79,65%
指数看跌缓冲区:536.05毫秒70,72%
批量放于阵列>缓冲区:105.43毫秒13,91%
批量投入buffer->缓冲区:99.09毫秒13,07%批量投入bufferD->缓冲区:80.38毫秒10,60%
循环得到缓冲:505.77毫秒66,73%
指数-GET缓冲区:562.84毫秒74,26%
批量获取buffer->阵:137.86毫秒18,19% - 直接缓冲测试:---------------------------------环放bufferD:570.69毫秒75,29%
指数看跌bufferD:562.76毫秒74,25%
批量放于阵列> bufferD:712.16毫秒93,96%
批量投入buffer-> bufferD:83.53毫秒11,02%批量投入bufferD-> bufferD:118.00毫秒15,57%
环获得bufferD:528.62毫秒69,74%
指数-GET bufferD:560.36毫秒73,93%
批量获取bufferD->阵:757.95毫秒100,00%

使用尺寸1000阵列拥有10万次迭代...

   - 阵列测试:--------------------------------- --------循环写入阵列:22.10毫秒6.21%
Arrays.fill:10.37毫秒2,91%
循环读取数组:81.12毫秒22,79%
System.arraycopy:10.59毫秒2.97% - 缓冲区测试:----------------------------------------环路把缓冲区:355.98毫秒100,00%
指数看跌缓冲区:353.80毫秒99,39%
批量放于阵列>缓冲区:16.33毫秒4,59%
批量投入buffer->缓冲区:5.40毫秒1.52%批量投入bufferD->缓冲区:4.95毫秒1.39%
循环得到缓冲:299.95毫秒84,26%
指数-GET缓冲区:343.05毫秒96,37%
批量获取buffer->阵:15.94毫秒4,48% - 直接缓冲测试:---------------------------------环放bufferD:355.11毫秒99,75%
指数看跌bufferD:348.63毫秒97,93%
批量放于阵列> bufferD:190.86毫秒53,61%
批量投入buffer-> bufferD:5.60毫秒1.57%批量投入bufferD-> bufferD:7.73毫秒2.17%
环获得bufferD:344.10毫秒96,66%
指数-GET bufferD:333.03毫秒93,55%
批量获取bufferD->阵:190.12毫秒53,41%

使用大小10,000阵列拥有10万次迭代...

   - 阵列测试:--------------------------------- --------循环写入阵列:156.02毫秒4,37%
Arrays.fill:109.06毫秒3,06%
循环读取数组:300.45毫秒8,42%
System.arraycopy:147.36毫秒4,13% - 缓冲区测试:----------------------------------------环路把缓冲区:3385.94毫秒94,89%
指数看跌缓冲区:3568.43毫秒100,00%
批量放于阵列>缓冲区:159.40毫秒4,47%
批量投入buffer->缓冲区:5.31毫秒0.15%批量投入bufferD->缓冲区:6.61毫秒0.19%
循环得到缓冲:2907.21毫秒81,47%
指数-GET缓冲区:3413.56毫秒95,66%
批量获取buffer->阵:177.31毫秒4,97% - 直接缓冲测试:---------------------------------环放bufferD:3319.25毫秒93,02%
指数看跌bufferD:3538.16毫秒99,15%
批量放于阵列> bufferD:1849.45毫秒51,83%
批量投入buffer-> bufferD:5.60毫秒0.16%批量投入bufferD-> bufferD:7.63毫秒0.21%
环获得bufferD:3227.26毫秒90,44%
指数-GET bufferD:3413.94毫秒95,67%
批量获取bufferD->阵:1848.24毫秒51,79%


解决方案

直接缓冲区不是为了加速从Java code访问。 (如果这是可能有一些错误的JVM自身的数组实现)。

这些字节的缓冲区是与其他组件的接口,你可以写一个字节的缓冲区到的 了ByteChannel ,您可以在使用本机code联合使用直接缓冲区,如你所提到的OpenGL库。它的目的是加快的这些的操作即可。使用显卡的芯片渲染可加速整体运作在一定程度上补偿比从Java code中的缓冲区可能慢访问的更多。

顺便说一句,如果你测量接入速度,一个字节的缓冲区,尤其是直接字节缓冲区,这是值得改变的字节顺序到的原始的字节顺序获取的 FloatBuffer 观点:

  FloatBuffer bufferD = ByteBuffer.allocateDirect(SIZE * 4)
                                .order(ByteOrder.nativeOrder())
                                .asFloatBuffer();

Question

While writing a Matrix class for use with OpenGL libraries, I came across the question of whether to use Java arrays or a Buffer strategy to store the data (JOGL offers direct-buffer copy for Matrix operations). To analyze this, I wrote a small performance test program that compares the relative speeds of loop and bulk operations on Arrays vs Buffers vs direct Buffers.

I'd like to share my results with you here (as I find them rather interesting). Please feel free to comment and/or point out any mistakes.
The code can be viewed at pastebin.com/is7UaiMV.

Notes

  • Loop-read array is implemented as A[i] = B[i] as otherwise the JIT optimizer will completely remove that code. Actual var = A[i] seems to be pretty much the same.

  • In the sample result for array size of 10,000 it is very likely that the JIT optimizer has replaced the looped array access with a System.arraycopy like implementation.

  • There is no bulk-get buffer->buffer as Java implements A.get(B) as B.put(A), therefore the results would be the same as the bulk-put results.

Conclusion

Under almost all situations it is strongly recommended to use the Java internal Arrays. Not only is the put/get speed massively faster, the JIT is as well able to perform much better optimizations on the final code.

Buffers should only be used if both the following applies:

  • You need to process large amounts of data.
  • That data is mostly or always bulk-processed.

Note that a backened-buffer has a Java Array backening the content of the buffer. It is recommended to do operations on this back-buffer instead of looping put/get.

Direct buffers should only be used if you worry about memory usage and never access the underlying data. They are slightly slower than non-direct buffers, much slower if the underlying data is accessed, but use less memory. In addition there is an extra overhead when converting non-byte data (like float-arrays) into bytes when using a direct buffer.

For more details see here:

Sample results

Note: Percentage is only for ease of reading and has no real meaning.

Using arrays of size 16 with 10,000,000 iterations...

-- Array tests: -----------------------------------------

Loop-write array:           87.29 ms  11,52%
Arrays.fill:                64.51 ms   8,51%
Loop-read array:            42.11 ms   5,56%
System.arraycopy:           47.25 ms   6,23%

-- Buffer tests: ----------------------------------------

Loop-put buffer:           603.71 ms  79,65%
Index-put buffer:          536.05 ms  70,72%
Bulk-put array->buffer:    105.43 ms  13,91%
Bulk-put buffer->buffer:    99.09 ms  13,07%

Bulk-put bufferD->buffer:   80.38 ms  10,60%
Loop-get buffer:           505.77 ms  66,73%
Index-get buffer:          562.84 ms  74,26%
Bulk-get buffer->array:    137.86 ms  18,19%

-- Direct buffer tests: ---------------------------------

Loop-put bufferD:          570.69 ms  75,29%
Index-put bufferD:         562.76 ms  74,25%
Bulk-put array->bufferD:   712.16 ms  93,96%
Bulk-put buffer->bufferD:   83.53 ms  11,02%

Bulk-put bufferD->bufferD: 118.00 ms  15,57%
Loop-get bufferD:          528.62 ms  69,74%
Index-get bufferD:         560.36 ms  73,93%
Bulk-get bufferD->array:   757.95 ms 100,00%

Using arrays of size 1,000 with 100,000 iterations...

-- Array tests: -----------------------------------------

Loop-write array:           22.10 ms   6,21%
Arrays.fill:                10.37 ms   2,91%
Loop-read array:            81.12 ms  22,79%
System.arraycopy:           10.59 ms   2,97%

-- Buffer tests: ----------------------------------------

Loop-put buffer:           355.98 ms 100,00%
Index-put buffer:          353.80 ms  99,39%
Bulk-put array->buffer:     16.33 ms   4,59%
Bulk-put buffer->buffer:     5.40 ms   1,52%

Bulk-put bufferD->buffer:    4.95 ms   1,39%
Loop-get buffer:           299.95 ms  84,26%
Index-get buffer:          343.05 ms  96,37%
Bulk-get buffer->array:     15.94 ms   4,48%

-- Direct buffer tests: ---------------------------------

Loop-put bufferD:          355.11 ms  99,75%
Index-put bufferD:         348.63 ms  97,93%
Bulk-put array->bufferD:   190.86 ms  53,61%
Bulk-put buffer->bufferD:    5.60 ms   1,57%

Bulk-put bufferD->bufferD:   7.73 ms   2,17%
Loop-get bufferD:          344.10 ms  96,66%
Index-get bufferD:         333.03 ms  93,55%
Bulk-get bufferD->array:   190.12 ms  53,41%

Using arrays of size 10,000 with 100,000 iterations...

-- Array tests: -----------------------------------------

Loop-write array:          156.02 ms   4,37%
Arrays.fill:               109.06 ms   3,06%
Loop-read array:           300.45 ms   8,42%
System.arraycopy:          147.36 ms   4,13%

-- Buffer tests: ----------------------------------------

Loop-put buffer:          3385.94 ms  94,89%
Index-put buffer:         3568.43 ms 100,00%
Bulk-put array->buffer:    159.40 ms   4,47%
Bulk-put buffer->buffer:     5.31 ms   0,15%

Bulk-put bufferD->buffer:    6.61 ms   0,19%
Loop-get buffer:          2907.21 ms  81,47%
Index-get buffer:         3413.56 ms  95,66%
Bulk-get buffer->array:    177.31 ms   4,97%

-- Direct buffer tests: ---------------------------------

Loop-put bufferD:         3319.25 ms  93,02%
Index-put bufferD:        3538.16 ms  99,15%
Bulk-put array->bufferD:  1849.45 ms  51,83%
Bulk-put buffer->bufferD:    5.60 ms   0,16%

Bulk-put bufferD->bufferD:   7.63 ms   0,21%
Loop-get bufferD:         3227.26 ms  90,44%
Index-get bufferD:        3413.94 ms  95,67%
Bulk-get bufferD->array:  1848.24 ms  51,79%

解决方案

Direct buffers are not meant to accelerate access from Java code. (If that were possible there was something wrong with the JVM’s own array implementation.)

These byte buffers are for interfacing with other components as you can write a byte buffer to a ByteChannel and you can use direct buffers in conjunction with native code such as with the OpenGL libraries you mentioned. It’s intended to accelerate these operation then. Using a graphics card’s chip for rendering can accelerate the overall operation to a degree more than compensating the possibly slower access to the buffer from Java code.

By the way, if you measure the access speed to a byte buffer, especially the direct byte buffers, it’s worth changing the byte order to the native byte order before acquiring a FloatBuffer view:

FloatBuffer bufferD = ByteBuffer.allocateDirect(SIZE * 4)
                                .order(ByteOrder.nativeOrder())
                                .asFloatBuffer();

这篇关于何时使用数组,缓冲或直接缓冲区的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆