用缓冲区读取二进制文件 [英] Read binary file with a buffer

查看:108
本文介绍了用缓冲区读取二进制文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试读取包含100.000个不同对象的二进制文件.
使用 BufferedReader 缓冲具有相同内容的简单文本文件只需要2MB.

I'm trying to read a binary file containing 100.000 different objects.
buffering a simple text file with the same content takes only 2MB with a BufferedReader.

但是读取二进制文件最多需要700 MB,如果我增加读取对象的数量,则会收到 OutOfMemory 错误.

But reading the binary files takes up to 700 MB and i get OutOfMemory error if I increase the number of objects to read.

那么,如何在不使内存饱和的情况下读取文件并逐个获取对象呢?

So how to read the file and get the objects one by one without saturating the memory?

这是我正在测试的代码:

Here is the code I'm testing:

public static void main(String[] args) throws Exception {
    int i = 0;
    String path = "data/file.bin";
    InputStream file = new FileInputStream(path);
    InputStream buffer = new BufferedInputStream(file);
    ObjectInputStream in = new ObjectInputStream(buffer);
    Object obj = null;
    while( ( obj = in.readObject() ) != null && i < 100000 ){
        String str =  obj.toString();
        System.out.println( str );
        i++;
    }

    timeTkken();
}

// Function to get the amount of time/memory used by the script
private static final long startTime = System.currentTimeMillis();
private static final long MEGABYTE = 1024L * 1024L;
public static void timeTkken(){
    Runtime runtime = Runtime.getRuntime();
    long endTime = System.currentTimeMillis();
    long memory = runtime.totalMemory() - runtime.freeMemory();
    long megabytes = memory / MEGABYTE;
    System.out.println("It took " + megabytes + "mb in " + ( (endTime - startTime) /1000 ) + "s ("+ memory + (" bytes in ") + (endTime - startTime) + " ms)");

}

推荐答案

据我所知,ObjectInputStream将所有对象保留在缓存中,直到关闭流为止.因此,如果您的二进制文件约为207 MB,那么Java堆中的实际对象可能会轻易占用数GB的RAM,并且无法对其进行垃圾回收.问题出现在这里:是否需要将所有数据同时保存在RAM中?

As far as I know, ObjectInputStream keeps all the objects in cache until the stream is closed. So if your binary file is ~207 MB, then real objects in java heap may easily take several GBs of RAM and they can't be garbage collected. Here the question appears: Do you need all of your data to be held in RAM simultaneously?

如果否(您想读取一个对象,以某种方式处理它,将其丢弃并移至下一个对象),我建议使用DataInputStream而不是ObjectInputStream.我不知道这种方法是否适用于您的情况,因为我不知道您的数据结构.如果您的数据是具有相同结构的记录的集合,则可以执行以下操作:

If no (you want to read an object, process it somehow, discard it and move to the next object), I would suggest using DataInputStream instead of ObjectInputStream. I don't know if this approach is applicable in your case since I don't know the structure of your data. If your data is a collection of records of the same structure, you may do the following:

    public class MyObject {
        private int age;
        private String name;

        public MyObject(int age, String name) {
            this.age = age;
            this.name = name;
        }
    }

    DataInputStream in = new DataInputStream(new BufferedInputStream(new FileInputStream("path.to.file")));
    // suppose that we store the total number of objects in the first 4 bytes of file
    int nObjects = in.readInt();
    for (int i = 0; i < nObjects; i++) {
        MyObject obj = new MyObject(in.readInt(), in.readUTF());
        // do some stuff with obj
    }

这篇关于用缓冲区读取二进制文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆