Newtonsoft json.net JsonTextReader垃圾收集器密集型 [英] Newtonsoft json.net JsonTextReader Garbage Collector intensive
问题描述
我们正在使用Newtonsoft.Json nuget包使用通过HTTP以JSON序列化为JSON的大型(GB)网络流,将响应流反序列化为内存中的记录以进行进一步处理.
鉴于数据量过多,我们正在使用流传输一次接收大量响应,并希望在达到CPU极限时优化此过程.
最佳选择之一是 JsonTextReader ,它会不断分配新对象并因此触发垃圾回收.
我们已遵循Newtonsoft 性能提示的建议.. >
我创建了一个示例.net控制台应用程序,用于模拟JsonTextReader读取响应流时分配新对象的行为,分配表示属性名称和值的字符串
问题:
>还有在现实世界中给定的95%重复(在测试中是同一记录,因此100%重复)的情况下,我们还有什么可以调整/覆盖来重用已经分配的属性名称/值实例的?示例应用程序:
Install-Package Newtonsoft.Json -Version 12.0.2
Install-Package System.Buffers -Version 4.5.0
Program.cs
using System;
using System.Buffers;
using System.IO;
using System.Linq;
using System.Text;
using Newtonsoft.Json;
namespace JsonNetTester
{
class Program
{
static void Main(string[] args)
{
using (var sr = new MockedStreamReader())
using (var jtr = new JsonTextReader(sr))
{
// does not seem to make any difference
//jtr.ArrayPool = JsonArrayPool.Instance;
// every read is allocating new objects
while (jtr.Read())
{
}
}
}
// simulating continuous stream of records serialised as json
public class MockedStreamReader : StreamReader
{
private bool initialProvided = false;
private byte[] initialBytes = Encoding.Default.GetBytes("[");
private static readonly byte[] recordBytes;
int nextStart = 0;
static MockedStreamReader()
{
var recordSb = new StringBuilder("{");
// generate [i] of { "Key[i]": "Value[i]" },
Enumerable.Range(0, 50).ToList().ForEach(i =>
{
if (i > 0)
{
recordSb.Append(",");
}
recordSb.Append($"\"Key{i}\": \"Value{i}\"");
});
recordSb.Append("},");
recordBytes = Encoding.Default.GetBytes(recordSb.ToString());
}
public MockedStreamReader() : base(new MemoryStream())
{ }
public override int Read(char[] buffer, int index, int count)
{
// keep on reading the same record in loop
if (this.initialProvided)
{
var start = nextStart;
var length = Math.Min(recordBytes.Length - start, count);
var end = start + length;
nextStart = end >= recordBytes.Length ? 0 : end;
Array.Copy(recordBytes, start, buffer, index, length);
return length;
}
else
{
initialProvided = true;
Array.Copy(initialBytes, buffer, initialBytes.Length);
return initialBytes.Length;
}
}
}
// attempt to reuse data in serialisation
public class JsonArrayPool : IArrayPool<char>
{
public static readonly JsonArrayPool Instance = new JsonArrayPool();
public char[] Rent(int minimumLength)
{
return ArrayPool<char>.Shared.Rent(minimumLength);
}
public void Return(char[] array)
{
ArrayPool<char>.Shared.Return(array);
}
}
}
}
可以通过Visual Studio调试> Performance Profiler> .NET对象分配跟踪或Performance Monitor #Gen 0/1集合
来观察分配.部分回答:
-
在执行操作时设置
JsonTextReader.ArrayPool
已经显示(也显示在DemoTests.ArrayPooling()
)应该有助于最大程度地减少由于在解析过程中分配中间字符数组而造成的内存压力.但是,由于分配了 strings ,这不会减少内存使用,这似乎是您的抱怨. -
截至发布12.0.1 ,Json.NET可以通过设置
JsonTextReader.PropertyNameTable
到一些适当的JsonNameTable
子类.反序列化期间,合同解析器,从而防止了序列化程序期望的已知属性名称的重复分配.
但是,您未使用序列化程序,正在直接阅读,因此没有利用此机制.要启用它,您可以创建自己的自定义
JsonNameTable
来缓存您实际遇到的属性名称:public class AutomaticJsonNameTable : DefaultJsonNameTable { int nAutoAdded = 0; int maxToAutoAdd; public AutomaticJsonNameTable(int maxToAdd) { this.maxToAutoAdd = maxToAdd; } public override string Get(char[] key, int start, int length) { var s = base.Get(key, start, length); if (s == null && nAutoAdded < maxToAutoAdd) { s = new string(key, start, length); Add(s); nAutoAdded++; } return s; } }
然后按如下所示使用它:
const int MaxPropertyNamesToCache = 200; // Set through experiment. var nameTable = new AutomaticJsonNameTable(MaxPropertyNamesToCache); using (var sr = new MockedStreamReader()) using (var jtr = new JsonTextReader(sr) { PropertyNameTable = nameTable }) { // Process as before. }
这应该大大减少由于属性名称引起的内存压力.
请注意,
AutomaticJsonNameTable
将仅自动高速缓存指定的有限数量的名称,以防止内存分配攻击.您需要通过实验确定此最大数量.您还可以手动对添加的已知已知属性名称进行硬编码.还请注意,通过手动指定名称表,可以防止在反序列化期间使用序列化程序指定的名称表.如果您的解析算法涉及通读文件以查找特定的嵌套对象,然后反序列化这些对象,则可以通过在反序列化之前暂时将名称表置空来获得更好的性能,例如使用以下扩展方法:
public static class JsonSerializerExtensions { public static T DeserializeWithDefaultNameTable<T>(this JsonSerializer serializer, JsonReader reader) { JsonNameTable old = null; var textReader = reader as JsonTextReader; if (textReader != null) { old = textReader.PropertyNameTable; textReader.PropertyNameTable = null; } try { return serializer.Deserialize<T>(reader); } finally { if (textReader != null) textReader.PropertyNameTable = old; } } }
需要通过实验确定使用序列化程序的名称表是否比您自己的性能更好(并且在编写此答案时,我还没有做过任何此类实验).
-
当前,即使跳过或忽略这些值,也无法阻止
JsonTextReader
为属性值分配字符串.参见请支持真正的跳过(不实现属性/等)#1021 进行类似的增强请求.这里您唯一的选择似乎是派生您自己的
JsonTextReader
并自己添加此功能.您需要找到对SetToken(JsonToken.String, _stringReference.ToString(), ...)
的所有调用,并将对__stringReference.ToString()
的调用替换为不会无条件分配内存的东西.例如,如果您要跳过很多JSON,则可以将
string DummyValue
添加到JsonTextReader
:public partial class MyJsonTextReader : JsonReader, IJsonLineInfo { public string DummyValue { get; set; }
然后在需要的地方(当前在两个位置)添加以下逻辑:
string text = DummyValue ?? _stringReference.ToString(); SetToken(JsonToken.String, text, false);
或
SetToken(JsonToken.String, DummyValue ?? _stringReference.ToString(), false);
然后,当您知道可以略过的读数时,可以将
MyJsonTextReader.DummyValue
设置为某些存根,例如"dummy value"
.或者,如果您有许多可以预先预测的不可跳过的重复属性值,则可以创建第二个
JsonNameTable StringValueNameTable
,如果不为空,则尝试查找增强请求#1021 请求进行投票或评论此功能,或自己添加类似的请求.
We are consuming a large (GBs) network stream serialised as JSON over http, using Newtonsoft.Json nuget package deserialising the response stream into in-memory records for further manipulation.
Given the excessive data volumes, we are using streaming to receive a chunk of response at a time and would like to optimise this process as we are hitting CPU limits.
One of the candidates for optimisations seems to be the JsonTextReader, which is constantly allocating new objects and hence triggering Garbage Collection.
We have followed advice from Newtonsoft Performance Tips.
I've created a sample .net console app simulating the behaviour allocating new objects as the JsonTextReader is reading through the response stream, allocating Strings representing property names and values
Question: Is there anything else we can tweak/override to reuse already allocated property names/values instances, given in real world 95% of them are repeated (in test it's the same record so 100% repetition)?
Sample app:
Install-Package Newtonsoft.Json -Version 12.0.2
Install-Package System.Buffers -Version 4.5.0
Program.cs
using System;
using System.Buffers;
using System.IO;
using System.Linq;
using System.Text;
using Newtonsoft.Json;
namespace JsonNetTester
{
class Program
{
static void Main(string[] args)
{
using (var sr = new MockedStreamReader())
using (var jtr = new JsonTextReader(sr))
{
// does not seem to make any difference
//jtr.ArrayPool = JsonArrayPool.Instance;
// every read is allocating new objects
while (jtr.Read())
{
}
}
}
// simulating continuous stream of records serialised as json
public class MockedStreamReader : StreamReader
{
private bool initialProvided = false;
private byte[] initialBytes = Encoding.Default.GetBytes("[");
private static readonly byte[] recordBytes;
int nextStart = 0;
static MockedStreamReader()
{
var recordSb = new StringBuilder("{");
// generate [i] of { "Key[i]": "Value[i]" },
Enumerable.Range(0, 50).ToList().ForEach(i =>
{
if (i > 0)
{
recordSb.Append(",");
}
recordSb.Append($"\"Key{i}\": \"Value{i}\"");
});
recordSb.Append("},");
recordBytes = Encoding.Default.GetBytes(recordSb.ToString());
}
public MockedStreamReader() : base(new MemoryStream())
{ }
public override int Read(char[] buffer, int index, int count)
{
// keep on reading the same record in loop
if (this.initialProvided)
{
var start = nextStart;
var length = Math.Min(recordBytes.Length - start, count);
var end = start + length;
nextStart = end >= recordBytes.Length ? 0 : end;
Array.Copy(recordBytes, start, buffer, index, length);
return length;
}
else
{
initialProvided = true;
Array.Copy(initialBytes, buffer, initialBytes.Length);
return initialBytes.Length;
}
}
}
// attempt to reuse data in serialisation
public class JsonArrayPool : IArrayPool<char>
{
public static readonly JsonArrayPool Instance = new JsonArrayPool();
public char[] Rent(int minimumLength)
{
return ArrayPool<char>.Shared.Rent(minimumLength);
}
public void Return(char[] array)
{
ArrayPool<char>.Shared.Return(array);
}
}
}
}
Allocations can be observed via Visual Studio Debug > Performance Profiler > .NET Object Allocation Tracking, or Performance Monitor #Gen 0/1 Collections
Answering in parts:
Setting
JsonTextReader.ArrayPool
as you are doing already (which is also shown inDemoTests.ArrayPooling()
) should help minimize memory pressure due to allocation of intermediate character arrays during parsing. It will not, however, reduce memory use due to allocation of strings, which seems to be your complaint.As of Release 12.0.1, Json.NET has the ability to reuse instances of property name strings by setting
JsonTextReader.PropertyNameTable
to some appropriateJsonNameTable
subclass.This mechanism is used during deserialization, by
JsonSerializer.SetupReader()
, to set a name table on the reader that returns the property names stored by the contract resolver, thus preventing repeated allocation of known property names expected by the serializer.You, however, are not using a serializer, you are reading directly, and so are not taking advantage of this mechanism. To enable it, you could create your own custom
JsonNameTable
to cache the property names you actually encounter:public class AutomaticJsonNameTable : DefaultJsonNameTable { int nAutoAdded = 0; int maxToAutoAdd; public AutomaticJsonNameTable(int maxToAdd) { this.maxToAutoAdd = maxToAdd; } public override string Get(char[] key, int start, int length) { var s = base.Get(key, start, length); if (s == null && nAutoAdded < maxToAutoAdd) { s = new string(key, start, length); Add(s); nAutoAdded++; } return s; } }
And then use it as follows:
const int MaxPropertyNamesToCache = 200; // Set through experiment. var nameTable = new AutomaticJsonNameTable(MaxPropertyNamesToCache); using (var sr = new MockedStreamReader()) using (var jtr = new JsonTextReader(sr) { PropertyNameTable = nameTable }) { // Process as before. }
This should substantially reduce memory pressure due to property names.
Note that
AutomaticJsonNameTable
will only auto-cache a specified, finite number of names to prevent memory allocation attacks. You'll need to determine this maximum number though experimentation. You could also manually hardcode the addition of expected, known property names.Note also that, by manually specifying a name table, you prevent use of the serializer-specified name table during deserialization. If your parsing algorithm involves reading through the file to locate specific nested objects, then deserializing those objects, you might get better performance by temporarily nulling out the name table before deserialization, e.g. with the following extension method:
public static class JsonSerializerExtensions { public static T DeserializeWithDefaultNameTable<T>(this JsonSerializer serializer, JsonReader reader) { JsonNameTable old = null; var textReader = reader as JsonTextReader; if (textReader != null) { old = textReader.PropertyNameTable; textReader.PropertyNameTable = null; } try { return serializer.Deserialize<T>(reader); } finally { if (textReader != null) textReader.PropertyNameTable = old; } } }
It would need to be determined by experimentation whether using the serializer's name table gives better performance than your own (and I have not done any such experiment as part of writing this answer).
There is currently no way to prevent
JsonTextReader
from allocating strings for property values even when skipping or otherwise ignoring those values. See please should support real skipping (no materialization of properties/etc) #1021 for a similar enhancement request.Your only option here would appear to be to fork your own version of
JsonTextReader
and add this capability yourself. You'd need to find all calls toSetToken(JsonToken.String, _stringReference.ToString(), ...)
and replace the call to__stringReference.ToString()
with something that doesn't unconditionally allocate memory.For instance, if you have a large chunk of JSON you would like to skip though, you could add a
string DummyValue
toJsonTextReader
:public partial class MyJsonTextReader : JsonReader, IJsonLineInfo { public string DummyValue { get; set; }
And then add the following logic where required (in two places currently):
string text = DummyValue ?? _stringReference.ToString(); SetToken(JsonToken.String, text, false);
Or
SetToken(JsonToken.String, DummyValue ?? _stringReference.ToString(), false);
Then, when reading value(s) you know can be skipped, you would set
MyJsonTextReader.DummyValue
to some stub, say"dummy value"
.Alternatively, if you have many non-skippable repeated property values that you can predict in advance, you could create a second
JsonNameTable StringValueNameTable
and, when non-null, try looking up theStringReference
in it like so:var text = StringValueNameTable?.Get(_stringReference.Chars, _stringReference.StartIndex, _stringReference.Length) ?? _stringReference.ToString();
Unfortunately, forking your own
JsonTextReader
may require substantial ongoing maintenance, since you will also need to fork any and all Newtonsoft utilities used by the reader (there are many) and update them to any breaking changes in the original library.You could also vote up or comment on enhancement request #1021 requesting this ability, or add a similar request yourself.
这篇关于Newtonsoft json.net JsonTextReader垃圾收集器密集型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!