如何在Json.NET中将巨大的JSON文件解析为流? [英] How to parse huge JSON file as stream in Json.NET?

查看:99
本文介绍了如何在Json.NET中将巨大的JSON文件解析为流?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个非常大的,包含相同JSON对象的JSON文件(1000+ MB).例如:

I have a very, very large JSON file (1000+ MB) of identical JSON objects. For example:

[
    {
        "id": 1,
        "value": "hello",
        "another_value": "world",
        "value_obj": {
            "name": "obj1"
        },
        "value_list": [
            1,
            2,
            3
        ]
    },
    {
        "id": 2,
        "value": "foo",
        "another_value": "bar",
        "value_obj": {
            "name": "obj2"
        },
        "value_list": [
            4,
            5,
            6
        ]
    },
    {
        "id": 3,
        "value": "a",
        "another_value": "b",
        "value_obj": {
            "name": "obj3"
        },
        "value_list": [
            7,
            8,
            9
        ]

    },
    ...
]

根JSON列表中的每个项目都采用相同的结构,因此可以单独进行反序列化.我已经编写了C#类来接收此数据,并反序列化包含单个对象但不包含列表的JSON文件,按预期工作.

Every single item in the root JSON list follows the same structure and thus would be individually deserializable. I already have the C# classes written to receive this data, and deserializing a JSON file containing a single object without the list works as expected.

起初,我试图直接在循环中反序列化我的对象:

At first, I tried to just directly deserialize my objects in a loop:

JsonSerializer serializer = new JsonSerializer();
MyObject o;
using (FileStream s = File.Open("bigfile.json", FileMode.Open))
using (StreamReader sr = new StreamReader(s))
using (JsonReader reader = new JsonTextReader(sr))
{
    while (!sr.EndOfStream)
    {
        o = serializer.Deserialize<MyObject>(reader);
    }
}

这没有用,抛出了一个异常,清楚地表明应该有一个对象,而不是一个列表.我的理解是,该命令只会读取JSON文件根目录中包含的单个对象,但是由于我们有对象的 list ,因此这是无效的请求.

This didn't work, threw an exception clearly stating that an object is expected, not a list. My understanding is that this command would just read a single object contained at the root level of the JSON file, but since we have a list of objects, this is an invalid request.

我的下一个想法是反序列化为对象的C#列表:

My next idea was to deserialize as a C# List of objects:

JsonSerializer serializer = new JsonSerializer();
List<MyObject> o;
using (FileStream s = File.Open("bigfile.json", FileMode.Open))
using (StreamReader sr = new StreamReader(s))
using (JsonReader reader = new JsonTextReader(sr))
{
    while (!sr.EndOfStream)
    {
        o = serializer.Deserialize<List<MyObject>>(reader);
    }
}

这确实成功.但是,它仅在某种程度上减少了高RAM使用率的问题.在这种情况下,看起来应用程序一次要对一个序列进行反序列化,因此不会将整个JSON文件读入RAM,但是由于C#List对象现在包含所有RAM中JSON文件中的数据.这只是解决了这个问题.

This does succeed. However, it only somewhat reduces the issue of high RAM usage. In this case it does look like the application is deserializing items one at a time, and so is not reading the entire JSON file into RAM, but we still end up with a lot of RAM usage because the C# List object now contains all of the data from the JSON file in RAM. This has only displaced the problem.

然后,我决定在进入循环之前,先通过执行sr.Read()尝试从流的开头取一个字符(以消除[).然后,第一个对象确实读取成功,但随后的对象读取成功,但意外令牌"除外.我的猜测是,这是对象之间的逗号和空格,使阅读器无法正常工作.

I then decided to simply try taking a single character off the beginning of the stream (to eliminate the [) by doing sr.Read() before going into the loop. The first object then does read successfully, but subsequent ones do not, with an exception of "unexpected token". My guess is this is the comma and space between the objects throwing the reader off.

仅删除方括号是行不通的,因为这些对象确实包含它们自己的原始列表,如您在示例中所看到的.甚至尝试使用},作为分隔符也行不通,因为如您所见,对象中包含子对象.

Simply removing square brackets won't work since the objects do contain a primitive list of their own, as you can see in the sample. Even trying to use }, as a separator won't work since, as you can see, there are sub-objects within the objects.

我的目标是能够一次从流中读取对象.读取一个对象,对其进行处理,然后将其从RAM中丢弃,然后读取下一个对象,依此类推.这样就无需将整个JSON字符串或数据的全部内容作为C#对象加载到RAM中.

What my goal is, is to be able to read the objects from the stream one at a time. Read an object, do something with it, then discard it from RAM, and read the next object, and so on. This would eliminate the need to load either the entire JSON string or the entire contents of the data into RAM as C# objects.

我想念什么?

推荐答案

这应该可以解决您的问题.基本上,它的工作方式与您的初始代码相同,不同之处在于,它仅在读者点击流中的{字符时反序列化对象,否则它会跳至下一个,直到找到另一个起始对象标记为止.

This should resolve your problem. Basically it works just like your initial code except it's only deserializing object when the reader hits the { character in the stream and otherwise it's just skipping to the next one until it find another start object token.

JsonSerializer serializer = new JsonSerializer();
MyObject o;
using (FileStream s = File.Open("bigfile.json", FileMode.Open))
using (StreamReader sr = new StreamReader(s))
using (JsonReader reader = new JsonTextReader(sr))
{
    while (reader.Read())
    {
        // deserialize only when there's "{" character in the stream
        if (reader.TokenType == JsonToken.StartObject)
        {
            o = serializer.Deserialize<MyObject>(reader);
        }
    }
}

这篇关于如何在Json.NET中将巨大的JSON文件解析为流?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆