将大数据流转换为字符串时内存不足 [英] out of memory when converting a large stream to string

查看:126
本文介绍了将大数据流转换为字符串时内存不足的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图将大数据流(4mb)转换为字符串,最终将其转换为JSON数组.

I am trying to convert a large stream (4mb) to a string which i eventually convert it to a JSON Array.

当流大小很小(以KB为单位)时,一切正常,从处理内存不足的4mb流开始的那一刻起

when the stream size is small ( in KB ) every thing works fine, the minute it starts to process the 4mb stream it runs out of memory

下面是我用来将流转换为字符串的方法,我已经尝试了几乎所有方法,并且我怀疑问题出在while循环上.有人可以帮忙吗?

below is what i use use to convert the stream to string, I've tried almost every thing and i suspect the issue is with the while loop. can some one please help?

  public String convertStreamToString(InputStream is)
            throws IOException {

        if (is != null) {
            Writer writer = new StringWriter();

            char[] buffer = new char[1024];
            try
            {
                Reader reader = new BufferedReader(
                        new InputStreamReader(is, "UTF-8"));
                int n;
                while ((n = reader.read(buffer)) != -1) 
                {
                    writer.write(buffer, 0, n);
                }
            }
            finally 
            {
                is.close();
            }
            return writer.toString();
        } else {       
            return "";
        }
    }


更新: 好的,这是我目前到达的位置,我走在正确的轨道上吗? 我想我已经关闭了..不确定我还能关闭或刷新什么以恢复内存.


Update: ok this is where i reached at the moment, am i on the right track? I think i am close.. not sure what else i can close or flush to regain memory..

public String convertStreamToString(InputStream is)
        throws IOException {

    String encoding = "UTF-8";
    int maxlines = 2000;
    StringWriter sWriter = new StringWriter(7168);
    BufferedWriter writer = new BufferedWriter(sWriter);
    BufferedReader reader = null;
    if (is == null) {
        return "";
    } else {     


        try {
            int count = 0;
            reader = new BufferedReader(new InputStreamReader(is, encoding));
            for (String line; (line = reader.readLine()) != null;) {
                if (count++ % maxlines == 0) {
                    sWriter.close();
                    // not sure what else to close or flush here to regain memory
                    //Log.v("Max Lines Reached", "Max Lines Reached");;
                }

                writer.write(line);


            }
            Log.v("Finished Loop", "Looping over");


    } finally {
        is.close();
        writer.close();

    }
        return writer.toString();
    }
}

推荐答案

StringWriter在内部写入StringBuffer. StringBuffer基本上是char数组的包装器.该阵列具有一定的容量.当该容量不足时,StringBuffer将分配一个新的更大的char数组,并复制前一个的内容.最后,您在StringWriter上调用toString(),它将再次将char数组的内容复制到生成的String的char数组中.

StringWriter writes to a StringBuffer internally. A StringBuffer is basically a wrapper round a char array. That array has a certain capacity. When that capacity is insufficient, StringBuffer will allocate a new larger char array and copy the contents of the previous one. At the end you call toString() on the StringWriter, which will again copy the contents of the char array into the char array of the resulting String.

如果您有任何方法可以事先知道所需的容量,则应使用StringWriter的构造函数来设置初始容量.这样可以避免不必要地复制数组以增加缓冲区.

If you have any means of knowing beforehand what the needed capacity is, you should use StringWriter's contructor that sets the initial capacity. That would avoid needlessly copying arrays to increase the buffer.

但是,这并不能避免在toString()中出现的最终副本.如果要处理的流可能很大,则可能需要重新考虑是否确实需要将该输入流作为String.直接使用足够大的char数组将避免所有复制,并且将大大减少内存使用量.

Yet that doesn't avoid the final copy that happens in toString(). If you're dealing with streams that can be large, you may need to reconsider whether you really need that inputstream as a String. Using a sufficiently large char array directly would avoid all the copying around, and would greatly reduce memory usage.

最终的解决方案是在输入所有输入之前对输入进行一些处理,以便可以丢弃已处理的字符.这样,您只需要在内存中保留与处理步骤所需的内存一样的大小即可.

The ultimate solution would be to do some of the processing of the input, before all of the input has come in, so the characters that have been processed can be discarded. This way you'll only need to hold as much in memory as what is needed for a processing step.

这篇关于将大数据流转换为字符串时内存不足的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆