通过流json的某些部分来解析庞大的OData JSON以避免LOH [英] Parse huge OData JSON by streaming certain sections of the json to avoid LOH

查看:82
本文介绍了通过流json的某些部分来解析庞大的OData JSON以避免LOH的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个OData响应,如JSON (几MB ),并且要求是流式传输"JSON的某些部分",甚至不将其加载到内存中.

I have an OData response as JSON (Which is in few MBs) and the requirement is to stream "certain parts of JSON" without even loading them to memory.

例如 :当我在下面的JSON中读取属性"value[0].Body.Content"(将以MB为单位)时,我想流式传输此值部分,而无需将其反序列化为字符串类型的Object.因此,基本上将值部分读取到固定大小的字节数组中,然后将该字节数组写入目标流中(重复此步骤,直到完成数据处理为止).

For Example: When I'm reading the property "value[0].Body.Content" in the below JSON (which will be in MBs), I want to Stream this value part without de-serializing it into an Object of type string. So basically read the value part into a fixed size byte array and write that byte array to destination stream (repeating the step until that data is finished processing).

JSON:

{
    "@odata.context": "https://localhost:5555/api/v2.0/$metadata#Me/Messages",
    "value": [
        {
            "@odata.id": "https://localhost:5555/api/v2.0/",
            "@odata.etag": "W/\"Something\"",
            "Id": "vccvJHDSFds43hwy98fh",
            "CreatedDateTime": "2018-12-01T01:47:53Z",
            "LastModifiedDateTime": "2018-12-01T01:47:53Z",
            "ChangeKey": "SDgf43tsdf",
            "WebLink": "https://localhost:5555/?ItemID=dfsgsdfg9876ijhrf",
            "Body": {
                "ContentType": "HTML",
                "Content": "<html>\r\n<body>Huge Data Here\r\n</body>\r\n</html>\r\n"
            },
            "ToRecipients": [{
                    "EmailAddress": {
                        "Name": "ME",
                        "Address": "me@me.com"
                    }
                }
            ],
            "CcRecipients": [],
            "BccRecipients": [],
            "ReplyTo": [],
            "Flag": {
                "FlagStatus": "NotFlagged"
            }
        }
    ],
    "@odata.nextLink": "http://localhost:5555/rest/jersey/sleep?%24filter=LastDeliveredDateTime+ge+2018-12-01+and+LastDeliveredDateTime+lt+2018-12-02&%24top=50&%24skip=50"
}

尝试的方法:
1. Newtonsoft

Approaches Tried:
1. Newtonsoft

我最初尝试使用Newtonsoft流,但它

I initially tried using Newtonsoft streaming, but it internally converts the data into string and loads into memory. (This is resulting in LOH shooting up and memory not getting released until compaction happens - We've a memory limit for our worker process and cannot keep this in memory)

**code:**

    using (var jsonTextReader = new JsonTextReader(sr))
    {
        var pool = new CustomArrayPool();
        // Checking if pooling will help with memory
        jsonTextReader.ArrayPool = pool;

        while (jsonTextReader.Read())
        {
            if (jsonTextReader.TokenType == JsonToken.PropertyName
                && ((string)jsonTextReader.Value).Equals("value"))
            {
                jsonTextReader.Read();

                if (jsonTextReader.TokenType == JsonToken.StartArray)
                {
                    while (jsonTextReader.Read())
                    {
                        if (jsonTextReader.TokenType == JsonToken.StartObject)
                        {
                            var Current = JToken.Load(jsonTextReader);
                            // By Now, the LOH Shoots up.
                            // Avoid below code of converting this JToken back to byte array.
                            destinationStream.write(Encoding.ASCII.GetBytes(Current.ToString()));
                        }
                        else if (jsonTextReader.TokenType == JsonToken.EndArray)
                        {
                            break;
                        }
                    }
                }
            }

            if (jsonTextReader.TokenType == JsonToken.StartObject)
            {
                var Current = JToken.Load(jsonTextReader);
                // Do some processing with Current
                destinationStream.write(Encoding.ASCII.GetBytes(Current.ToString()));
            }
        }
    }

  1. OData.Net:

我在考虑使用OData.Net库是否可行,因为它

I was thinking if this is doable using OData.Net Library as it looks like it supports streaming of string fields. But couldn't get far, as I end up with creating a Model for the data, which would mean the value would get converted into one string object of MB's.

代码

ODataMessageReaderSettings settings = new ODataMessageReaderSettings();
IODataResponseMessage responseMessage = new InMemoryMessage { Stream = stream };
responseMessage.SetHeader("Content-Type", "application/json;odata.metadata=minimal;");
// ODataMessageReader reader = new ODataMessageReader((IODataResponseMessage)message, settings, GetEdmModel());
ODataMessageReader reader = new ODataMessageReader(responseMessage, settings, new EdmModel());
var oDataResourceReader = reader.CreateODataResourceReader();
var property = reader.ReadProperty();


有什么想法如何使用OData.Net/Newtonsoft和某些字段的流值来部分解析此JSON?
这样做的唯一方法是手动解析流吗?

Any idea how to parse this JSON in parts using OData.Net/Newtonsoft and stream value of certain fields?
Is the only way to do this, is to manually parse the stream?

推荐答案

如果要将JSON的一部分从一个流复制到另一个流,则可以使用

If you are copying portions of JSON from one stream to another, you can do this more efficiently with JsonWriter.WriteToken(JsonReader) thus avoiding the intermediate Current = JToken.Load(jsonTextReader) and Encoding.ASCII.GetBytes(Current.ToString()) representations and their associated memory overhead:

using (var textWriter = new StreamWriter(destinationStream, new UTF8Encoding(false, true), 1024, true))
using (var jsonWriter = new JsonTextWriter(textWriter) { Formatting = Formatting.Indented, CloseOutput = false })
{
    // Use Formatting.Indented or Formatting.None as required.
    jsonWriter.WriteToken(jsonTextReader);
}

但是,Json.NET的 JsonTextReader 没有能够以与

However, Json.NET's JsonTextReader does not have the ability to read a single string value in "chunks" in the same way as XmlReader.ReadValueChunk(). It will always fully materialize each atomic string value. If your strings values are so large that they are going on the large object heap, even using JsonWriter.WriteToken() will not prevent these strings from being completely loaded into memory.

作为替代方案,您可以考虑读取

As an alternative, you might consider the readers and writers returned by JsonReaderWriterFactory. These readers and writers are used by DataContractJsonSerializer and translate JSON to XML on-the-fly as it is being read and written. Since the base classes for these readers and writers are XmlReader and XmlWriter, they do support reading and writing string values in chunks. Using them appropriately will avoid allocation of strings in the large object heap.

为此,首先定义以下扩展方法,这些方法将JSON值的选定子集从输入流复制到输出流,如要流数据的路径所指定:

To do this, first define the following extension methods, that copy a selected subset of JSON value(s) from an input stream to an output stream, as specified by a path to the data to be streamed:

public static class JsonExtensions
{
    public static void StreamNested(Stream from, Stream to, string [] path)
    {
        var reversed = path.Reverse().ToArray();

        using (var xr = JsonReaderWriterFactory.CreateJsonReader(from, XmlDictionaryReaderQuotas.Max))
        {
            foreach (var subReader in xr.ReadSubtrees(s => s.Select(n => n.LocalName).SequenceEqual(reversed)))
            {
                using (var xw = JsonReaderWriterFactory.CreateJsonWriter(to, Encoding.UTF8, false))
                {
                    subReader.MoveToContent();

                    xw.WriteStartElement("root");
                    xw.WriteAttributes(subReader, true);

                    subReader.Read();

                    while (!subReader.EOF)
                    {
                        if (subReader.NodeType == XmlNodeType.Element && subReader.Depth == 1)
                            xw.WriteNode(subReader, true);
                        else
                            subReader.Read();
                    }

                    xw.WriteEndElement();
                }
            }
        }
    }
}

public static class XmlReaderExtensions
{
    public static IEnumerable<XmlReader> ReadSubtrees(this XmlReader xmlReader, Predicate<Stack<XName>> filter)
    {
        Stack<XName> names = new Stack<XName>();

        while (xmlReader.Read())
        {
            if (xmlReader.NodeType == XmlNodeType.Element)
            {
                names.Push(XName.Get(xmlReader.LocalName, xmlReader.NamespaceURI));
                if (filter(names))
                {
                    using (var subReader = xmlReader.ReadSubtree())
                    {
                        yield return subReader;
                    }
                }
            }

            if ((xmlReader.NodeType == XmlNodeType.Element && xmlReader.IsEmptyElement)
                || xmlReader.NodeType == XmlNodeType.EndElement)
            {
                names.Pop();
            }
        }
    }
}

现在,StreamNested()string [] path参数不是不是任何路径.相反,它是与XML元素的层次结构相对应的路径,XML层次结构与您要选择的JSON相对应,而

Now, the string [] path argument to StreamNested() is not any sort of jsonpath path. Instead, it is a path corresponding to the hierarchy of XML elements corresponding to the JSON you want to select as translated by the XmlReader returned by JsonReaderWriterFactory.CreateJsonReader(). The mapping used for this translation is, in turn, documented by Microsoft in Mapping Between JSON and XML. To select and stream only those JSON values matching value[*], the XML path required is //root/value/item. Thus, you can select and stream your desired nested objects by doing:

JsonExtensions.StreamNested(inputStream, destinationStream, new[] { "root", "value", "item" });

注意:

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆