如何在Java中有效地存储小字节数组? [英] How to efficiently store small byte arrays in Java?

查看:178
本文介绍了如何在Java中有效地存储小字节数组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

字节数组是指长度从10到30的字节数组。

By small byte arrays I mean arrays of bytes with length from 10 up to 30.

存储我的意思是将它们存储在RAM中,而不是序列化并持久保存到文件系统。

By store I mean storing them in the RAM, not serializing and persisting to the filesystem.


系统macOS 10.12。 6,Oracle jdk1.8.0_141 64位,JVM args -Xmx1g

System macOS 10.12.6, Oracle jdk1.8.0_141 64bit, JVM args -Xmx1g

示例:
预期新字节[200 * 1024 * 1024] 的行为是≈200mb的堆空间

Example: Expected behavior for new byte[200 * 1024 * 1024] is ≈200mb of the heap space

public static final int TARGET_SIZE = 200 * 1024 * 1024;
public static void main(String[] args) throws InterruptedException {
    byte[] arr = new byte[TARGET_SIZE];
    System.gc();
    System.out.println("Array size: " + arr.length);
    System.out.println("HeapSize: " + Runtime.getRuntime().totalMemory());
    Thread.sleep(60000);
}


public static final int TARGET_SIZE = 200 * 1024 * 1024;
public static void main(String[] args) throws InterruptedException {
    final int oneArraySize = 20;
    final int numberOfArrays = TARGET_SIZE / oneArraySize;
    byte[][] arrays = new byte[numberOfArrays][];
    for (int i = 0; i < numberOfArrays; i++) {
        arrays[i] = new byte[oneArraySize];
    }
    System.gc();
    System.out.println("Arrays size: " + arrays.length);
    System.out.println("HeapSize: " + Runtime.getRuntime().totalMemory());
    Thread.sleep(60000);
}



这个开销来自哪里?如何有效地存储e并使用小字节数组(数据块)?

From where this overhead is coming? How to efficiently store and work with small byte arrays (chunks of data)?

新字节[200 * 1024 * 1024] [1]
它吃掉

for new byte[200*1024*1024][1] it eats

基本数学说新字节[1] 权重 24字节。

Basic math says that new byte[1] weights 24 bytes.

根据 Java中对象的内存消耗是多少?
Java中对象的最小大小为 16字节。从我之前的测量 24字节-4字节的int长度-1我的数据的实际字节= 3个字节的一些其他垃圾填充。

According to What is the memory consumption of an object in Java? the minimum size of an object in Java is 16 bytes. From my previous "measurements" 24 bytes -4 bytes for int length -1 actual byte of my data = 3 bytes of some other garbage padding.

推荐答案

Eugene的回答解释了为什么要观察大量阵列的内存消耗增加的原因。标题中的问题,如何在Java中有效地存储小字节数组?,可以回答:完全没有。 1

The answer by Eugene explains the reason of why you are observing such an increase in memory consumption for a large number of arrays. The question in the title, "How to efficiently store small byte arrays in Java?", may then be answered with: Not at all. 1

然而,可能有办法实现你的目标。像往常一样,这里的最佳解决方案将取决于这些数据将如何使用。一个非常实用的方法是:为您的数据结构定义一个接口

However, there probably are ways to achieve your goals. As usual, the "best" solution here will depend on how this data is going to be used. A very pragmatic approach would be: Define an interface for your data structure.

在最简单的情况下,这个接口可能只是

In the simplest case, this interface could just be

interface ByteArray2D 
{
    int getNumRows();
    int getNumColumns();
    byte get(int r, int c);
    void set(int r, int c, byte b);
}

提供2D字节数组的基本抽象。根据应用案例,在此提供其他方法可能是有益的。这里可以使用的模式通常与 Matrix库相关,它们处理2D矩阵(通常为 float 值),并且它们经常使用提供这样的方法:

Offering a basic abstraction of a "2D byte array". Depending on the application case, it may be beneficial to offer additional methods here. The patterns that could be employed here are frequently relevant for Matrix libraries, which handle "2D matrices" (usually of float values), and they often offer methods like these:

interface Matrix {
    Vector getRow(int row);
    Vector getColumn(int column);
    ...
}

然而,当这里的主要目的是处理一组 byte [] 数组,访问每个数组的方法(即2D数组的每一行)就足够了:

However, when the main purpose here is to handle a set of byte[] arrays, methods for accessing each array (that is, each row of the 2D array) could be sufficient:

ByteBuffer getRow(int row);






鉴于此界面,创建不同的界面很简单实现。例如,您可以在内部创建一个只存储2D byte [] [] 数组的简单实现:

class SimpleByteArray2D implements ByteArray2D 
{
    private final byte array[][];
    ...
}

或者,你可以创建一个存储的实现一个 1D byte [] 数组,或类似地, ByteBuffer 内部:

Alternatively, you could create an implementation that stores a 1D byte[] array, or analogously, a ByteBuffer internally:

class CompactByteArray2D implements ByteArray2D
{
    private final ByteBuffer buffer;
    ...
}

这个实现只需要计算( 1D)当调用访问2D数组的某个行/列的方法之一时的索引。

This implementation then just has to compute the (1D) index when calling one of the methods for accessing a certain row/column of the 2D array.

下面你会发现MCVE ,显示此接口和两个实现,接口的基本用法,以及使用JOL进行内存占用分析。

Below you will find a MCVE that shows this interface and the two implementations, the basic usage of the interface, and that does a memory footprint analysis using JOL.

此程序的输出为:

For 10 rows and 1000 columns:
Total size for SimpleByteArray2D : 10240
Total size for CompactByteArray2D: 10088

For 100 rows and 100 columns:
Total size for SimpleByteArray2D : 12440
Total size for CompactByteArray2D: 10088

For 1000 rows and 10 columns:
Total size for SimpleByteArray2D : 36040
Total size for CompactByteArray2D: 10088

显示


  • SimpleByteA rray2D 基于简单的2D byte [] [] 数组的实现在行数增加时需要更多内存(即使总大小)数组保持不变)

  • the SimpleByteArray2D implementation that is based on a simple 2D byte[][] array requires more memory when the number of rows increases (even if the total size of the array remains constant)

CompactByteArray2D 的内存消耗独立结构的em>

the memory consumption of the CompactByteArray2D is independent of the structure of the array

整个程序:

package stackoverflow;

import java.nio.ByteBuffer;

import org.openjdk.jol.info.GraphLayout;

public class EfficientByteArrayStorage
{
    public static void main(String[] args)
    {
        showExampleUsage();
        anaylyzeMemoryFootprint();
    }

    private static void anaylyzeMemoryFootprint()
    {
        testMemoryFootprint(10, 1000);
        testMemoryFootprint(100, 100);
        testMemoryFootprint(1000, 10);
    }

    private static void testMemoryFootprint(int rows, int cols)
    {
        System.out.println("For " + rows + " rows and " + cols + " columns:");

        ByteArray2D b0 = new SimpleByteArray2D(rows, cols);
        GraphLayout g0 = GraphLayout.parseInstance(b0);
        System.out.println("Total size for SimpleByteArray2D : " + g0.totalSize());
        //System.out.println(g0.toFootprint());

        ByteArray2D b1 = new CompactByteArray2D(rows, cols);
        GraphLayout g1 = GraphLayout.parseInstance(b1);
        System.out.println("Total size for CompactByteArray2D: " + g1.totalSize());
        //System.out.println(g1.toFootprint());
    }

    // Shows an example of how to use the different implementations
    private static void showExampleUsage()
    {
        System.out.println("Using a SimpleByteArray2D");
        ByteArray2D b0 = new SimpleByteArray2D(10, 10);
        exampleUsage(b0);

        System.out.println("Using a CompactByteArray2D");
        ByteArray2D b1 = new CompactByteArray2D(10, 10);
        exampleUsage(b1);
    }

    private static void exampleUsage(ByteArray2D byteArray2D)
    {
        // Reading elements of the array
        System.out.println(byteArray2D.get(2, 4));

        // Writing elements of the array
        byteArray2D.set(2, 4, (byte)123);
        System.out.println(byteArray2D.get(2, 4));

        // Bulk access to rows
        ByteBuffer row = byteArray2D.getRow(2);
        for (int c = 0; c < row.capacity(); c++)
        {
            System.out.println(row.get(c));
        }

        // (Commented out for this MCVE: Writing one row to a file)
        /*/
        try (FileChannel fileChannel = 
            new FileOutputStream(new File("example.dat")).getChannel())
        {
            fileChannel.write(byteArray2D.getRow(2));
        }
        catch (IOException e)
        {
            e.printStackTrace();
        }
        //*/
    }

}


interface ByteArray2D 
{
    int getNumRows();
    int getNumColumns();
    byte get(int r, int c);
    void set(int r, int c, byte b);

    // Bulk access to rows, for convenience and efficiency
    ByteBuffer getRow(int row);
}

class SimpleByteArray2D implements ByteArray2D 
{
    private final int rows;
    private final int cols;
    private final byte array[][];

    public SimpleByteArray2D(int rows, int cols)
    {
        this.rows = rows;
        this.cols = cols;
        this.array = new byte[rows][cols];
    }

    @Override
    public int getNumRows()
    {
        return rows;
    }

    @Override
    public int getNumColumns()
    {
        return cols;
    }

    @Override
    public byte get(int r, int c)
    {
        return array[r][c];
    }

    @Override
    public void set(int r, int c, byte b)
    {
        array[r][c] = b;
    }

    @Override
    public ByteBuffer getRow(int row)
    {
        return ByteBuffer.wrap(array[row]);
    }
}

class CompactByteArray2D implements ByteArray2D
{
    private final int rows;
    private final int cols;
    private final ByteBuffer buffer;

    public CompactByteArray2D(int rows, int cols)
    {
        this.rows = rows;
        this.cols = cols;
        this.buffer = ByteBuffer.allocate(rows * cols);
    }

    @Override
    public int getNumRows()
    {
        return rows;
    }

    @Override
    public int getNumColumns()
    {
        return cols;
    }

    @Override
    public byte get(int r, int c)
    {
        return buffer.get(r * cols + c);
    }

    @Override
    public void set(int r, int c, byte b)
    {
        buffer.put(r * cols + c, b);
    }

    @Override
    public ByteBuffer getRow(int row)
    {
        ByteBuffer r = buffer.slice();
        r.position(row * cols);
        r.limit(row * cols + cols);
        return r.slice();
    }
}

同样,这主要是作为素描,显示一种可能的方法。界面的细节将取决于预期的应用模式。

Again, this is mainly intended as a sketch, to show one possible approach. The details of the interface will depend on the intended application pattern.

1 附注:

内存开销问题在其他语言中类似。例如,在C / C ++中,最接近2D Java数组的结构将是一个手动分配指针的数组:

The problem of the memory overhead is similar in other languages. For example, in C/C++, the structure that most closely resembles a "2D Java array" would be an array of manually allocated pointers:

char** array;
array = new (char*)[numRows];
array[0] = new char[numCols];
...

在这种情况下,您的开销也与行数 - 即每行的一个(通常为4个字节)指针。

In this case, you also have an overhead that is proportional to the number of rows - namely, one (usually 4 byte) pointer for each row.

这篇关于如何在Java中有效地存储小字节数组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆