从磁盘加载大型json文件时出现内存不足异常 [英] Out of memory exception while loading large json file from disk

查看:185
本文介绍了从磁盘加载大型json文件时出现内存不足异常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个1.2 GB的json文件,对它进行反序列化后,应该会给我一个包含1500万个对象的列表.

I have a 1.2 GB json file which when deserialized ought to give me a list with 15 mil objects.

我要对其进行反序列化的计算机是具有16核和32 GB Ram的Windows 2012服务器(64位).

The machine on which I'm trying to deserialize the same is a windows 2012 server(64 bit) with 16 core and 32 GB Ram.

该应用程序的目标是x64.

The application has been built with target of x64.

尽管如此,当我尝试读取json文档并将其转换为我遇到内存不足异常的对象列表时. 当我查看任务管理器时,我发现仅使用了5GB内存.

Inspite of this when I try to read the json doc and convert it to list of objects I'm getting Out of memory exception. when I look at task manager I find that only 5GB memory has been used.

我尝试的代码如下.

a.

 string plays_json = File.ReadAllText("D:\\Hun\\enplays.json");

                plays = JsonConvert.DeserializeObject<List<playdata>>(plays_json);

b.

 string plays_json = "";
        using (var reader = new StreamReader("D:\\Hun\\enplays.json"))
        {
            plays_json = reader.ReadToEnd();
            plays = JsonConvert.DeserializeObject<List<playdata>>(plays_json);
        }

c.

 using (StreamReader sr = File.OpenText("D:\\Hun\\enplays.json"))
        {
            StringBuilder sb = new StringBuilder();
            sb.Append(sr.ReadToEnd());
            plays_json = sb.ToString();
            plays = JsonConvert.DeserializeObject<List<playdata>>(plays_json);
        }

衷心感谢所有帮助

推荐答案

问题是您正在将整个大文件读入内存,然后尝试一次将其全部反序列化为一个大列表.您应该使用StreamReader逐步处理文件.即使您在其中使用StreamReader,问题中的示例(b)也不会删除它,因为您仍在通过ReadToEnd()读取整个文件.您应该改为执行以下操作:

The problem is that you are reading your entire huge file into memory and then trying to deserialize it all at once into a huge list. You should be using a StreamReader to process your file incrementally. Example (b) in your question doesn't cut it, even though you are using a StreamReader there, because you are still reading the entire file via ReadToEnd(). You should be doing something like this instead:

using (StreamReader sr = new StreamReader("D:\\Hun\\enplays.json"))
using (JsonTextReader reader = new JsonTextReader(sr))
{
    var serializer = new JsonSerializer();

    while (reader.Read())
    {
        if (reader.TokenType == JsonToken.StartObject)
        {
            // Deserialize each object from the stream individually and process it
            var playdata = serializer.Deserialize<playdata>(reader);

            ProcessPlayData(playdata);
        }
    }
}

ProcessPlayData方法应该处理单个playdata对象,然后理想地将结果写入文件或数据库,而不是内存列表中(否则您可能会再次回到相同的情况).如果必须将处理每个项目的结果存储到内存列表中,则可能需要考虑使用链表或类似的结构,该结构不会尝试在一个连续的块中分配内存,并且不需要重新分配和复制当它需要扩展时.

The ProcessPlayData method should process a single playdata object and then ideally write the result to a file or a database rather than an in-memory list (otherwise you may find yourself back in the same situation again). If you must store the results of processing each item into an in-memory list, then you might want to consider using a linked list or a similar structure that does not try to allocate memory in one contiguous block and does not need to reallocate and copy when it needs to expand.

这篇关于从磁盘加载大型json文件时出现内存不足异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆