使用JSON.NET解析1GB json文件时出现问题 [英] Issues parsing a 1GB json file using JSON.NET

查看:108
本文介绍了使用JSON.NET解析1GB json文件时出现问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个应用程序,其输入已从50K位置记录扩展到110万个位置记录. 由于整个文件先前都已反序列化为单个对象,因此这引起了严重的问题. 对于具有110万条记录的生产类文件,对象的大小约为1GB. 由于大对象GC问题,我希望将反序列化的对象保持在85K标记以下.

I have gotten an application where the input has been scaled up from 50K location records to 1.1 Million location records. This has caused serious issues as the entire file was previously de-serialized into a single object. The size of the object is ~1GB for a production like file with 1.1 Million records. Due to large object GC issues I want to keep the de-serialized object below the 85K mark.

我正在尝试一次解析单个位置对象并反序列化它,以便我可以控制对象的数量 反序列化,然后控制对象的大小.我正在使用Json.Net库来做到这一点.

I'm trying to parse out a single location object at a time and de-serialize it so I can control the number of objects that get de-serialized and in turn control the size of the object. I'm using the Json.Net libraries to do this.

下面是我作为应用程序流接收到的JSON文件的示例.

Below is a sample of the JSON file that I'm receiving as a stream into my application.

    {
    "Locations": [{
        "LocationId": "",
        "ParentLocationId": "",
        "DisplayFlag": "Y",
        "DisplayOptions": "",
        "DisplayName": "",
        "Address": "",
        "SecondaryAddress": "",
        "City": "",
        "State": "",
        "PostalCode": "",
        "Country": "",
        "Latitude": 40.59485,
        "Longitude": -73.96174,
        "LatLonQuality": 99,
        "BusinessLogoUrl": "",
        "BusinessUrl": "",
        "DisplayText": "",
        "PhoneNumber": "",
        "VenueGroup": 7,
        "VenueType": 0,
        "SubVenue": 0,
        "IndoorFlag": "",
        "OperatorDefined": "",
        "AccessPoints": [{
            "AccessPointId": "",
            "MACAddress": "",
            "DisplayFlag": "",
            "DisplayOptions": "",
            "Latitude": 40.59485,
            "Longitude": -73.96174,
            "Status": "Up",
            "OperatorDefined": "",
            "RoamingGroups": [{
                "GroupName": ""
            },
            {
                "GroupName": ""
            }],
            "Radios": [{
                "RadioId": "",
                "RadioFrequency": "",
                "RadioProtocols": [{
                    "Protocol": ""
                }],
                "WifiConnections": [{
                    "BSSID": "",
                    "ServiceSets": [{
                        "SSID": "",
                        "SSID_Broadcasted": ""
                    }]
                }]
            }]
        }]
    },
    {
        "LocationId": "",
        "ParentLocationId": "",
        "DisplayFlag": "Y",
        "DisplayOptions": "",
        "DisplayName": "",
        "Address": "",
        "SecondaryAddress": "",
        "City": "",
        "State": "",
        "PostalCode": "",
        "Country": "",
        "Latitude": 40.59485,
        "Longitude": -73.96174,
        "LatLonQuality": 99,
        "BusinessLogoUrl": "",
        "BusinessUrl": "",
        "DisplayText": "",
        "PhoneNumber": "",
        "VenueGroup": 7,
        "VenueType": 0,
        "SubVenue": 0,
        "IndoorFlag": "",
        "OperatorDefined": "",
        "AccessPoints": [{
            "AccessPointId": "",
            "MACAddress": "",
            "DisplayFlag": "",
            "DisplayOptions": "",
            "Latitude": 40.59485,
            "Longitude": -73.96174,
            "Status": "Up",
            "OperatorDefined": "",
            "RoamingGroups": [{
                "GroupName": ""
            },
            {
                "GroupName": ""
            }],
            "Radios": [{
                "RadioId": "",
                "RadioFrequency": "",
                "RadioProtocols": [{
                    "Protocol": ""
                }],
                "WifiConnections": [{
                    "BSSID": "",
                    "ServiceSets": [{
                        "SSID": "",
                        "SSID_Broadcasted": ""
                    }]
                }]
            }]
        }]
    }]
}

我需要能够拉出单个Location对象,以便查看以下内容

I need to be able to pull out the individual Location objects, so that I would be looking at the following

    {
    "LocationId": "",
    "ParentLocationId": "",
    "DisplayFlag": "Y",
    "DisplayOptions": "",
    "DisplayName": "",
    "Address": "",
    "SecondaryAddress": "",
    "City": "",
    "State": "",
    "PostalCode": "",
    "Country": "",
    "Latitude": 40.59485,
    "Longitude": -73.96174,
    "LatLonQuality": 99,
    "BusinessLogoUrl": "",
    "BusinessUrl": "",
    "DisplayText": "",
    "PhoneNumber": "",
    "VenueGroup": 7,
    "VenueType": 0,
    "SubVenue": 0,
    "IndoorFlag": "",
    "OperatorDefined": "",
    "AccessPoints": [{
        "AccessPointId": "",
        "MACAddress": "",
        "DisplayFlag": "",
        "DisplayOptions": "",
        "Latitude": 40.59485,
        "Longitude": -73.96174,
        "Status": "Up",
        "OperatorDefined": "",
        "RoamingGroups": [{
            "GroupName": ""
        },
        {
            "GroupName": ""
        }],
        "Radios": [{
            "RadioId": "",
            "RadioFrequency": "",
            "RadioProtocols": [{
                "Protocol": ""
            }],
            "WifiConnections": [{
                "BSSID": "",
                "ServiceSets": [{
                    "SSID": "",
                    "SSID_Broadcasted": ""
                }]
            }]
        }]
    }]
}

我正在尝试使用Json.NET JsonTextReader来完成此操作,但是由于读取器最初流中记录的大小,我无法使读取器在其缓冲区中包含整个位置.直到流到对象中间的"RadioProtocols",当流到达对象的末尾时,阅读器已经丢弃了对象的开始.

I'm trying to use the Json.NET JsonTextReader to accomplish this, however I cannot get the reader to contain an entire location in its buffer, due to the size of the records in the stream the reader initially will have down as far as "RadioProtocols", which is mid way through the object, by the time the stream reaches the end of the object, the reader has discarded the start of the object.

我用来使该功能正常工作的代码是

The code I'm using to try to get this functionality to work is

var ser = new JsonSerializer();
using (var reader = new JsonTextReader(new StreamReader(stream)))
{
    reader.SupportMultipleContent = true;

    while (reader.Read())
    {   
        if (reader.TokenType == JsonToken.StartObject && reader.Depth == 2)
        {                            
            do
            {
                reader.Read();                                
            } while (reader.TokenType != JsonToken.EndObject && reader.Depth == 2);

            var singleLocation = ser.Deserialize<Locations>(reader);
        }
    }
}

任何有关此方法或替代方法的信息,将不胜感激.附带说明,我们的客户发送信息的方式目前无法更改.

Any information on this or an alternative to doing it would be greatly appreciated. As a side note, the way our customers send the information cannot change at this time.

推荐答案

感谢所有帮助,我已经设法按照我想要的方式对各个位置对象进行反序列化.

Thanks for all the help, I've managed to get it doing what I want which is de-serializing individual location objects.

如果将项目转换为JObject,它将读取完整的对象并将其反序列化,可以循环获取解决方案.

If the item is converted to a JObject it will read in the full object and de-serialize it, this can be looped to get the solution.

这是确定的代码

while (reader.Read())
{
    if (reader.TokenType == JsonToken.StartObject && reader.Depth == 2)
    {
        location = JObject.Load(reader).ToObject<Location>();

        var lv = new LocationValidator(location, FootprintInfo.OperatorId, FootprintInfo.RoamingGroups, true);
        var vr = lv.IsValid();
        if (vr.Successful)
        {
            yield return location;
        }
        else
        {
            errors.Add(new Error(elNumber, location.LocationId, vr.Error.Field, vr.Error.Detail));
            if (errors.Count >= maxErrors)
            {
                yield break;
            }
        }

        ++elNumber;
    }
}

这篇关于使用JSON.NET解析1GB json文件时出现问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆