通过流json的某些部分来解析庞大的OData JSON以避免LOH [英] Parse huge OData JSON by streaming certain sections of the json to avoid LOH
问题描述
我有一个OData响应,如JSON (几MB ),并且要求是流式传输"JSON的某些部分",甚至不将其加载到内存中.
I have an OData response as JSON (Which is in few MBs) and the requirement is to stream "certain parts of JSON" without even loading them to memory.
例如 :当我在下面的JSON中读取属性"value[0].Body.Content
"(将以MB为单位)时,我想流式传输此值部分,而无需将其反序列化为字符串类型的Object.因此,基本上将值部分读取到固定大小的字节数组中,然后将该字节数组写入目标流中(重复此步骤,直到完成数据处理为止).
For Example: When I'm reading the property "value[0].Body.Content
" in the below JSON (which will be in MBs), I want to Stream this value part without de-serializing it into an Object of type string. So basically read the value part into a fixed size byte array and write that byte array to destination stream (repeating the step until that data is finished processing).
JSON:
{
"@odata.context": "https://localhost:5555/api/v2.0/$metadata#Me/Messages",
"value": [
{
"@odata.id": "https://localhost:5555/api/v2.0/",
"@odata.etag": "W/\"Something\"",
"Id": "vccvJHDSFds43hwy98fh",
"CreatedDateTime": "2018-12-01T01:47:53Z",
"LastModifiedDateTime": "2018-12-01T01:47:53Z",
"ChangeKey": "SDgf43tsdf",
"WebLink": "https://localhost:5555/?ItemID=dfsgsdfg9876ijhrf",
"Body": {
"ContentType": "HTML",
"Content": "<html>\r\n<body>Huge Data Here\r\n</body>\r\n</html>\r\n"
},
"ToRecipients": [{
"EmailAddress": {
"Name": "ME",
"Address": "me@me.com"
}
}
],
"CcRecipients": [],
"BccRecipients": [],
"ReplyTo": [],
"Flag": {
"FlagStatus": "NotFlagged"
}
}
],
"@odata.nextLink": "http://localhost:5555/rest/jersey/sleep?%24filter=LastDeliveredDateTime+ge+2018-12-01+and+LastDeliveredDateTime+lt+2018-12-02&%24top=50&%24skip=50"
}
尝试的方法:
1. Newtonsoft
Approaches Tried:
1. Newtonsoft
I initially tried using Newtonsoft streaming, but it internally converts the data into string and loads into memory. (This is resulting in LOH shooting up and memory not getting released until compaction happens - We've a memory limit for our worker process and cannot keep this in memory)
**code:**
using (var jsonTextReader = new JsonTextReader(sr))
{
var pool = new CustomArrayPool();
// Checking if pooling will help with memory
jsonTextReader.ArrayPool = pool;
while (jsonTextReader.Read())
{
if (jsonTextReader.TokenType == JsonToken.PropertyName
&& ((string)jsonTextReader.Value).Equals("value"))
{
jsonTextReader.Read();
if (jsonTextReader.TokenType == JsonToken.StartArray)
{
while (jsonTextReader.Read())
{
if (jsonTextReader.TokenType == JsonToken.StartObject)
{
var Current = JToken.Load(jsonTextReader);
// By Now, the LOH Shoots up.
// Avoid below code of converting this JToken back to byte array.
destinationStream.write(Encoding.ASCII.GetBytes(Current.ToString()));
}
else if (jsonTextReader.TokenType == JsonToken.EndArray)
{
break;
}
}
}
}
if (jsonTextReader.TokenType == JsonToken.StartObject)
{
var Current = JToken.Load(jsonTextReader);
// Do some processing with Current
destinationStream.write(Encoding.ASCII.GetBytes(Current.ToString()));
}
}
}
-
OData.Net:
I was thinking if this is doable using OData.Net Library as it looks like it supports streaming of string fields. But couldn't get far, as I end up with creating a Model for the data, which would mean the value would get converted into one string object of MB's.
代码
ODataMessageReaderSettings settings = new ODataMessageReaderSettings();
IODataResponseMessage responseMessage = new InMemoryMessage { Stream = stream };
responseMessage.SetHeader("Content-Type", "application/json;odata.metadata=minimal;");
// ODataMessageReader reader = new ODataMessageReader((IODataResponseMessage)message, settings, GetEdmModel());
ODataMessageReader reader = new ODataMessageReader(responseMessage, settings, new EdmModel());
var oDataResourceReader = reader.CreateODataResourceReader();
var property = reader.ReadProperty();
有什么想法如何使用OData.Net/Newtonsoft和某些字段的流值来部分解析此JSON?
这样做的唯一方法是手动解析流吗?
Any idea how to parse this JSON in parts using OData.Net/Newtonsoft and stream value of certain fields?
Is the only way to do this, is to manually parse the stream?
推荐答案
如果要将JSON的一部分从一个流复制到另一个流,则可以使用 JsonWriter.WriteToken(JsonReader)
,从而避免了中间的Current = JToken.Load(jsonTextReader)
和Encoding.ASCII.GetBytes(Current.ToString())
表示及其相关的内存开销:
If you are copying portions of JSON from one stream to another, you can do this more efficiently with JsonWriter.WriteToken(JsonReader)
thus avoiding the intermediate Current = JToken.Load(jsonTextReader)
and Encoding.ASCII.GetBytes(Current.ToString())
representations and their associated memory overhead:
using (var textWriter = new StreamWriter(destinationStream, new UTF8Encoding(false, true), 1024, true))
using (var jsonWriter = new JsonTextWriter(textWriter) { Formatting = Formatting.Indented, CloseOutput = false })
{
// Use Formatting.Indented or Formatting.None as required.
jsonWriter.WriteToken(jsonTextReader);
}
但是,Json.NET的 JsonTextReader
没有能够以与
However, Json.NET's JsonTextReader
does not have the ability to read a single string value in "chunks" in the same way as XmlReader.ReadValueChunk()
. It will always fully materialize each atomic string value. If your strings values are so large that they are going on the large object heap, even using JsonWriter.WriteToken()
will not prevent these strings from being completely loaded into memory.
作为替代方案,您可以考虑读取和
As an alternative, you might consider the readers and writers returned by JsonReaderWriterFactory
. These readers and writers are used by DataContractJsonSerializer
and translate JSON to XML on-the-fly as it is being read and written. Since the base classes for these readers and writers are XmlReader
and XmlWriter
, they do support reading and writing string values in chunks. Using them appropriately will avoid allocation of strings in the large object heap.
为此,首先定义以下扩展方法,这些方法将JSON值的选定子集从输入流复制到输出流,如要流数据的路径所指定:
To do this, first define the following extension methods, that copy a selected subset of JSON value(s) from an input stream to an output stream, as specified by a path to the data to be streamed:
public static class JsonExtensions
{
public static void StreamNested(Stream from, Stream to, string [] path)
{
var reversed = path.Reverse().ToArray();
using (var xr = JsonReaderWriterFactory.CreateJsonReader(from, XmlDictionaryReaderQuotas.Max))
{
foreach (var subReader in xr.ReadSubtrees(s => s.Select(n => n.LocalName).SequenceEqual(reversed)))
{
using (var xw = JsonReaderWriterFactory.CreateJsonWriter(to, Encoding.UTF8, false))
{
subReader.MoveToContent();
xw.WriteStartElement("root");
xw.WriteAttributes(subReader, true);
subReader.Read();
while (!subReader.EOF)
{
if (subReader.NodeType == XmlNodeType.Element && subReader.Depth == 1)
xw.WriteNode(subReader, true);
else
subReader.Read();
}
xw.WriteEndElement();
}
}
}
}
}
public static class XmlReaderExtensions
{
public static IEnumerable<XmlReader> ReadSubtrees(this XmlReader xmlReader, Predicate<Stack<XName>> filter)
{
Stack<XName> names = new Stack<XName>();
while (xmlReader.Read())
{
if (xmlReader.NodeType == XmlNodeType.Element)
{
names.Push(XName.Get(xmlReader.LocalName, xmlReader.NamespaceURI));
if (filter(names))
{
using (var subReader = xmlReader.ReadSubtree())
{
yield return subReader;
}
}
}
if ((xmlReader.NodeType == XmlNodeType.Element && xmlReader.IsEmptyElement)
|| xmlReader.NodeType == XmlNodeType.EndElement)
{
names.Pop();
}
}
}
}
现在,StreamNested()
的string [] path
参数不是不是任何 jsonpath 路径.相反,它是与XML元素的层次结构相对应的路径,XML层次结构与您要选择的JSON相对应,而由在JSON和XML之间进行映射 .要仅选择和流传输与value[*]
相匹配的JSON值,所需的XML路径为//root/value/item
.因此,您可以执行以下操作来选择并流式传输所需的嵌套对象:
Now, the string [] path
argument to StreamNested()
is not any sort of jsonpath path. Instead, it is a path corresponding to the hierarchy of XML elements corresponding to the JSON you want to select as translated by the XmlReader
returned by JsonReaderWriterFactory.CreateJsonReader()
. The mapping used for this translation is, in turn, documented by Microsoft in Mapping Between JSON and XML. To select and stream only those JSON values matching value[*]
, the XML path required is //root/value/item
. Thus, you can select and stream your desired nested objects by doing:
JsonExtensions.StreamNested(inputStream, destinationStream, new[] { "root", "value", "item" });
注意:
-
And then determine the correct XML path observationally.
有关相关问题,请参见 .NET中等效的JObject.SelectToken .
For a related question, see JObject.SelectToken Equivalent in .NET.
这篇关于通过流json的某些部分来解析庞大的OData JSON以避免LOH的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!