在Java中读取大文件时如何避免OutOfMemory异常 [英] How to avoid OutOfMemory exception while reading large files in Java

查看:358
本文介绍了在Java中读取大文件时如何避免OutOfMemory异常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究从文件读取大量数据的应用程序。基本上,我有一个巨大的文件(大约1.5-2个演出),其中包含不同的对象(每个文件约5至1000万个对象)。我需要阅读所有内容,并将其放置在应用程序中的不同地图上。问题在于该应用程序在某些时候读取对象时内存不足。仅当我将其设置为使用-Xmx4096m时,它才能处理该文件。但是,如果文件变大,它将无法再执行此操作。

I am working on the application which reads large amounts of data from a file. Basically, I have a huge file (around 1.5 - 2 gigs) containing different objects (~5 to 10 millions of them per file). I need to read all of them and put them to different maps in the app. The problem is that the app runs out of memory while reading the objects at some point. Only when I set it to use -Xmx4096m - it can handle the file. But if the file will be larger, it won't be able to do that anymore.

这是代码段:

String sampleFileName = "sample.file";
FileInputStream fileInputStream = null;
ObjectInputStream objectInputStream = null;
try{
    fileInputStream = new FileInputStream(new File(sampleFileName));
    int bufferSize = 16 * 1024;
    objectInputStream = new ObjectInputStream(new BufferedInputStream(fileInputStream, bufferSize));
        while (true){
            try{
                Object objectToRead = objectInputStream.readUnshared();
                if (objectToRead == null){
                    break;
                }
                // doing something with the object
            }catch (EOFException eofe){
                eofe.printStackTrace();
                break;
            } catch (Exception e) {
                e.printStackTrace();
                continue;
            }
        }
} catch (Exception e){
        e.printStackTrace();
}finally{
    if (objectInputStream != null){
        try{
            objectInputStream.close();
        }catch (Exception e2){
            e2.printStackTrace();
        }
    }
    if (fileInputStream != null){
        try{
            fileInputStream.close();
        }catch (Exception e2){
            e2.printStackTrace();
        }
    }
}

首先,我是使用 objectInputStream.readObject()代替 objectInputStream.readUnshared(),因此可以部分解决此问题。当我将内存从2048增加到4096时,它开始解析文件。 BufferedInputStream已在使用中。从网络上,我仅找到了如何读取行或字节的示例,但从对象的角度来看,在性能方面却无所作为。

First of all, I was using objectInputStream.readObject() instead of objectInputStream.readUnshared(), so it solved the issue partially. When I increased the memory from 2048 to 4096, it started parsing the file. BufferedInputStream is already in use. From the web I've found only examples how to read lines or bytes, but nothing regarding objects, performance wise.

如何在不增加内存的情况下读取文件JVM和避免OutOfMemory异常?有什么方法可以从文件中读取对象,而不在内存中保留其他内容?

How can I read the file without increasing the memory for JVM and avoiding the OutOfMemory exception? Is there any way to read objects from the file, not keeping anything else in the memory?

推荐答案

在读取大文件,解析对象并将其保留在内存中时,有几种解决方案需要权衡:

When reading big files, parsing objects and keeping them in memory there are several solutions with several tradeoffs:


  1. 您可以将所有已解析的对象放入内存中,以部署在一台服务器上。它要么要求以非常压缩的方式存储所有对象,例如使用字节或整数存储2个数字,要么使用其他数据结构中的某种形式的移位。换句话说,将所有对象放置在可能的最小空间中。或增加该服务器的内存(垂直扩展)

  1. You can fit all parsed objects into memory for that app deployed on one server. It either requires to store all objects in very zipped way, for example using byte or integer to store 2 numbers or some kind of shifting in other data structures. In other words fitting all objects in possible minimum space. Or increase memory for that server(scale vertically)

a)但是读取文件可能会占用过多的内存,因此必须分块读取它们。例如,这就是我对json文件所做的操作:

a) However reading the files can take too much memory, so you have to read them in chunks. For example this is what I was doing with json files:

JsonReader reader = new JsonReader(new InputStreamReader(in, "UTF-8"));
    if (reader.hasNext()) {
        reader.beginObject();
        String name = reader.nextName();

        if ("content".equals(name)) {
            reader.beginArray();

            parseContentJsonArray(reader, name2ContentMap);

            reader.endArray();
        }
        name = reader.nextName();
        if ("ad".equals(name)) {
            reader.beginArray();

            parsePrerollJsonArray(reader, prerollMap);

            reader.endArray();
        }
    }

想法是要有一种方法来确定何时确定对象的开始和结束,并且只读取该部分。

The idea is to have a way to identify when certain object starts and ends and read only that part.

b)如果可以的话,还可以在源代码处将文件拆分为较小的文件,这样阅读起来会更容易

b) You can also split files to smaller ones at the source if you can, then it will be easier to read them.

您不能在一台服务器上容纳该应用程序的所有解析对象。在这种情况下,您必须基于某些对象属性进行分片。例如,将基于美国州的数据拆分为多个服务器。

You can't fit all parsed objects for that app on one server. In this case you have to shard based on some object property. For example split data based on US state into multiple servers.

希望对您的解决方案有所帮助。

Hopefully it helps in your solution.

这篇关于在Java中读取大文件时如何避免OutOfMemory异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆