如何在 Json.NET 中将巨大的 JSON 文件解析为流? [英] How to parse huge JSON file as stream in Json.NET?

查看:58
本文介绍了如何在 Json.NET 中将巨大的 JSON 文件解析为流?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个非常非常大的 JSON 文件 (1000+ MB),其中包含相同的 JSON 对象.例如:

<预><代码>[{身份证":1,"value": "你好","another_value": "世界",value_obj":{名称":obj1"},值列表":[1、2、3]},{身份证":2,价值":富","another_value": "bar",value_obj":{名称":obj2"},值列表":[4、5、6]},{身份证":3,"值": "a","another_value": "b",value_obj":{名称":obj3"},值列表":[7、8、9]},...]

根 JSON 列表中的每一项都遵循相同的结构,因此可以单独反序列化.我已经编写了 C# 类来接收这些数据,并且反序列化一个包含单个对象的 JSON 文件而没有按预期工作.

起初,我试图在循环中直接反序列化我的对象:

JsonSerializer serializer = new JsonSerializer();我的对象 o;使用 (FileStream s = File.Open("bigfile.json", FileMode.Open))使用 (StreamReader sr = new StreamReader(s))使用 (JsonReader reader = new JsonTextReader(sr)){而 (!sr.EndOfStream){o = serializer.Deserialize(阅读器);}}

这不起作用,抛出了一个异常,明确指出需要一个对象,而不是一个列表.我的理解是这个命令只会读取包含在 JSON 文件根级别的单个对象,但由于我们有一个对象的列表,这是一个无效的请求.

我的下一个想法是反序列化为 C# 对象列表:

JsonSerializer serializer = new JsonSerializer();列表<我的对象>Ø;使用 (FileStream s = File.Open("bigfile.json", FileMode.Open))使用 (StreamReader sr = new StreamReader(s))使用 (JsonReader reader = new JsonTextReader(sr)){而 (!sr.EndOfStream){o = serializer.Deserialize>(reader);}}

这确实成功了.然而,它只是在一定程度上减少了高 RAM 使用率的问题.在这种情况下,看起来应用程序一次反序列化一个项目,因此没有将整个 JSON 文件读入 RAM,但我们最终仍然使用了大量 RAM,因为 C# List 对象现在包含所有来自 RAM 中 JSON 文件的数据.这只是取代了问题.

然后我决定通过执行 sr.Read() 在进入流之前简单地尝试从流的开头去掉一个字符(以消除 [)环形.然后第一个对象确实读取成功,但后续对象没有成功读取,意外令牌"除外.我猜这是对象之间的逗号和空格让读者望而却步.

简单地删除方括号是行不通的,因为对象确实包含自己的原始列表,如您在示例中所见.即使尝试使用 }, 作为分隔符也不起作用,因为如您所见,对象中有子对象.

我的目标是一次从流中读取一个对象.读取一个对象,用它做一些事情,然后从 RAM 中丢弃它,然后读取下一个对象,依此类推.这将消除将整个 JSON 字符串或数据的全部内容作为 C# 对象加载到 RAM 中的需要.

我错过了什么?

解决方案

这应该可以解决您的问题.基本上它就像你的初始代码一样工作,除了它只是在读取器点击流中的 { 字符时反序列化对象,否则它只是跳到下一个直到找到另一个起始对象标记.

JsonSerializer serializer = new JsonSerializer();我的对象 o;使用 (FileStream s = File.Open(bigfile.json", FileMode.Open))使用 (StreamReader sr = new StreamReader(s))使用 (JsonReader reader = new JsonTextReader(sr)){而 (reader.Read()){//仅当存在{"时反序列化流中的字符if (reader.TokenType == JsonToken.StartObject){o = serializer.Deserialize(阅读器);}}}

I have a very, very large JSON file (1000+ MB) of identical JSON objects. For example:

[
    {
        "id": 1,
        "value": "hello",
        "another_value": "world",
        "value_obj": {
            "name": "obj1"
        },
        "value_list": [
            1,
            2,
            3
        ]
    },
    {
        "id": 2,
        "value": "foo",
        "another_value": "bar",
        "value_obj": {
            "name": "obj2"
        },
        "value_list": [
            4,
            5,
            6
        ]
    },
    {
        "id": 3,
        "value": "a",
        "another_value": "b",
        "value_obj": {
            "name": "obj3"
        },
        "value_list": [
            7,
            8,
            9
        ]

    },
    ...
]

Every single item in the root JSON list follows the same structure and thus would be individually deserializable. I already have the C# classes written to receive this data, and deserializing a JSON file containing a single object without the list works as expected.

At first, I tried to just directly deserialize my objects in a loop:

JsonSerializer serializer = new JsonSerializer();
MyObject o;
using (FileStream s = File.Open("bigfile.json", FileMode.Open))
using (StreamReader sr = new StreamReader(s))
using (JsonReader reader = new JsonTextReader(sr))
{
    while (!sr.EndOfStream)
    {
        o = serializer.Deserialize<MyObject>(reader);
    }
}

This didn't work, threw an exception clearly stating that an object is expected, not a list. My understanding is that this command would just read a single object contained at the root level of the JSON file, but since we have a list of objects, this is an invalid request.

My next idea was to deserialize as a C# List of objects:

JsonSerializer serializer = new JsonSerializer();
List<MyObject> o;
using (FileStream s = File.Open("bigfile.json", FileMode.Open))
using (StreamReader sr = new StreamReader(s))
using (JsonReader reader = new JsonTextReader(sr))
{
    while (!sr.EndOfStream)
    {
        o = serializer.Deserialize<List<MyObject>>(reader);
    }
}

This does succeed. However, it only somewhat reduces the issue of high RAM usage. In this case it does look like the application is deserializing items one at a time, and so is not reading the entire JSON file into RAM, but we still end up with a lot of RAM usage because the C# List object now contains all of the data from the JSON file in RAM. This has only displaced the problem.

I then decided to simply try taking a single character off the beginning of the stream (to eliminate the [) by doing sr.Read() before going into the loop. The first object then does read successfully, but subsequent ones do not, with an exception of "unexpected token". My guess is this is the comma and space between the objects throwing the reader off.

Simply removing square brackets won't work since the objects do contain a primitive list of their own, as you can see in the sample. Even trying to use }, as a separator won't work since, as you can see, there are sub-objects within the objects.

What my goal is, is to be able to read the objects from the stream one at a time. Read an object, do something with it, then discard it from RAM, and read the next object, and so on. This would eliminate the need to load either the entire JSON string or the entire contents of the data into RAM as C# objects.

What am I missing?

解决方案

This should resolve your problem. Basically it works just like your initial code except it's only deserializing object when the reader hits the { character in the stream and otherwise it's just skipping to the next one until it finds another start object token.

JsonSerializer serializer = new JsonSerializer();
MyObject o;
using (FileStream s = File.Open("bigfile.json", FileMode.Open))
using (StreamReader sr = new StreamReader(s))
using (JsonReader reader = new JsonTextReader(sr))
{
    while (reader.Read())
    {
        // deserialize only when there's "{" character in the stream
        if (reader.TokenType == JsonToken.StartObject)
        {
            o = serializer.Deserialize<MyObject>(reader);
        }
    }
}

这篇关于如何在 Json.NET 中将巨大的 JSON 文件解析为流?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆