Newtonsoft json.net JsonTextReader垃圾收集器密集型 [英] Newtonsoft json.net JsonTextReader Garbage Collector intensive

查看:56
本文介绍了Newtonsoft json.net JsonTextReader垃圾收集器密集型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在使用Newtonsoft.Json nuget包使用通过HTTP以JSON序列化为JSON的大型(GB)网络流,将响应流反序列化为内存中的记录以进行进一步处理.

鉴于数据量过多,我们正在使用流传输一次接收大量响应,并希望在达到CPU极限时优化此过程.

最佳选择之一是 JsonTextReader ,它会不断分配新对象并因此触发垃圾回收.

我们已遵循Newtonsoft 性能提示的建议.. >

我创建了一个示例.net控制台应用程序,用于模拟JsonTextReader读取响应流时分配新对象的行为,分配表示属性名称和值的字符串

问题:

>还有在现实世界中给定的95%重复(在测试中是同一记录,因此100%重复)的情况下,我们还有什么可以调整/覆盖来重用已经分配的属性名称/值实例的?

示例应用程序:

Install-Package Newtonsoft.Json -Version 12.0.2
Install-Package System.Buffers -Version 4.5.0

Program.cs

using System;
using System.Buffers;
using System.IO;
using System.Linq;
using System.Text;
using Newtonsoft.Json;

namespace JsonNetTester
{
    class Program
    {
        static void Main(string[] args)
        {
            using (var sr = new MockedStreamReader())
            using (var jtr = new JsonTextReader(sr))
            {
                // does not seem to make any difference
                //jtr.ArrayPool = JsonArrayPool.Instance;

                // every read is allocating new objects
                while (jtr.Read())
                {
                }
            }
        }

        // simulating continuous stream of records serialised as json
        public class MockedStreamReader : StreamReader
        {
            private bool initialProvided = false;
            private byte[] initialBytes = Encoding.Default.GetBytes("[");
            private static readonly byte[] recordBytes;
            int nextStart = 0;

            static MockedStreamReader()
            {
                var recordSb = new StringBuilder("{");

                // generate [i] of { "Key[i]": "Value[i]" }, 
                Enumerable.Range(0, 50).ToList().ForEach(i =>
                {
                    if (i > 0)
                    {
                        recordSb.Append(",");
                    }
                    recordSb.Append($"\"Key{i}\": \"Value{i}\"");
                });

                recordSb.Append("},");
                recordBytes = Encoding.Default.GetBytes(recordSb.ToString());
            }

            public MockedStreamReader() : base(new MemoryStream())
            {   }

            public override int Read(char[] buffer, int index, int count)
            {
                // keep on reading the same record in loop
                if (this.initialProvided)
                {
                    var start = nextStart;
                    var length = Math.Min(recordBytes.Length - start, count);
                    var end = start + length;
                    nextStart = end >= recordBytes.Length ? 0 : end;
                    Array.Copy(recordBytes, start, buffer, index, length);
                    return length;
                }
                else
                {
                    initialProvided = true;
                    Array.Copy(initialBytes, buffer, initialBytes.Length);
                    return initialBytes.Length;
                }
            }
        }

        // attempt to reuse data in serialisation
        public class JsonArrayPool : IArrayPool<char>
        {
            public static readonly JsonArrayPool Instance = new JsonArrayPool();

            public char[] Rent(int minimumLength)
            {
                return ArrayPool<char>.Shared.Rent(minimumLength);
            }

            public void Return(char[] array)
            {
                ArrayPool<char>.Shared.Return(array);
            }
        }
    }
}

可以通过Visual Studio调试> Performance Profiler> .NET对象分配跟踪或Performance Monitor #Gen 0/1集合

来观察分配.

解决方案

部分回答:

  1. 在执行操作时设置 JsonTextReader.ArrayPool 已经显示(也显示在 DemoTests.ArrayPooling() )应该有助于最大程度地减少由于在解析过程中分配中间字符数组而造成的内存压力.但是,由于分配了 strings ,这不会减少内存使用,这似乎是您的抱怨.

  2. 截至发布12.0.1 ,Json.NET可以通过设置 JsonTextReader.PropertyNameTable 到一些适当的 JsonNameTable 子类.

    反序列化期间,请支持真正的跳过(不实现属性/等)#1021 进行类似的增强请求.

    这里您唯一的选择似乎是派生您自己的 JsonTextReader 并自己​​添加此功能.您需要找到对SetToken(JsonToken.String, _stringReference.ToString(), ...)的所有调用,并将对__stringReference.ToString()的调用替换为不会无条件分配内存的东西.

    例如,如果您要跳过很多JSON,则可以将string DummyValue添加到JsonTextReader:

    public partial class MyJsonTextReader : JsonReader, IJsonLineInfo
    {
        public string DummyValue { get; set; }
    

    然后在需要的地方(当前在两个位置)添加以下逻辑:

    string text = DummyValue ?? _stringReference.ToString();
    SetToken(JsonToken.String, text, false);
    

    SetToken(JsonToken.String,  DummyValue ?? _stringReference.ToString(), false); 
    

    然后,当您知道可以略过的读数时,可以将MyJsonTextReader.DummyValue设置为某些存根,例如"dummy value".

    或者,如果您有许多可以预先预测的不可跳过的重复属性值,则可以创建第二个JsonNameTable StringValueNameTable,如果不为空,则尝试查找增强请求#1021 请求进行投票或评论此功能,或自己添加类似的请求.

We are consuming a large (GBs) network stream serialised as JSON over http, using Newtonsoft.Json nuget package deserialising the response stream into in-memory records for further manipulation.

Given the excessive data volumes, we are using streaming to receive a chunk of response at a time and would like to optimise this process as we are hitting CPU limits.

One of the candidates for optimisations seems to be the JsonTextReader, which is constantly allocating new objects and hence triggering Garbage Collection.

We have followed advice from Newtonsoft Performance Tips.

I've created a sample .net console app simulating the behaviour allocating new objects as the JsonTextReader is reading through the response stream, allocating Strings representing property names and values

Question: Is there anything else we can tweak/override to reuse already allocated property names/values instances, given in real world 95% of them are repeated (in test it's the same record so 100% repetition)?

Sample app:

Install-Package Newtonsoft.Json -Version 12.0.2
Install-Package System.Buffers -Version 4.5.0

Program.cs

using System;
using System.Buffers;
using System.IO;
using System.Linq;
using System.Text;
using Newtonsoft.Json;

namespace JsonNetTester
{
    class Program
    {
        static void Main(string[] args)
        {
            using (var sr = new MockedStreamReader())
            using (var jtr = new JsonTextReader(sr))
            {
                // does not seem to make any difference
                //jtr.ArrayPool = JsonArrayPool.Instance;

                // every read is allocating new objects
                while (jtr.Read())
                {
                }
            }
        }

        // simulating continuous stream of records serialised as json
        public class MockedStreamReader : StreamReader
        {
            private bool initialProvided = false;
            private byte[] initialBytes = Encoding.Default.GetBytes("[");
            private static readonly byte[] recordBytes;
            int nextStart = 0;

            static MockedStreamReader()
            {
                var recordSb = new StringBuilder("{");

                // generate [i] of { "Key[i]": "Value[i]" }, 
                Enumerable.Range(0, 50).ToList().ForEach(i =>
                {
                    if (i > 0)
                    {
                        recordSb.Append(",");
                    }
                    recordSb.Append($"\"Key{i}\": \"Value{i}\"");
                });

                recordSb.Append("},");
                recordBytes = Encoding.Default.GetBytes(recordSb.ToString());
            }

            public MockedStreamReader() : base(new MemoryStream())
            {   }

            public override int Read(char[] buffer, int index, int count)
            {
                // keep on reading the same record in loop
                if (this.initialProvided)
                {
                    var start = nextStart;
                    var length = Math.Min(recordBytes.Length - start, count);
                    var end = start + length;
                    nextStart = end >= recordBytes.Length ? 0 : end;
                    Array.Copy(recordBytes, start, buffer, index, length);
                    return length;
                }
                else
                {
                    initialProvided = true;
                    Array.Copy(initialBytes, buffer, initialBytes.Length);
                    return initialBytes.Length;
                }
            }
        }

        // attempt to reuse data in serialisation
        public class JsonArrayPool : IArrayPool<char>
        {
            public static readonly JsonArrayPool Instance = new JsonArrayPool();

            public char[] Rent(int minimumLength)
            {
                return ArrayPool<char>.Shared.Rent(minimumLength);
            }

            public void Return(char[] array)
            {
                ArrayPool<char>.Shared.Return(array);
            }
        }
    }
}

Allocations can be observed via Visual Studio Debug > Performance Profiler > .NET Object Allocation Tracking, or Performance Monitor #Gen 0/1 Collections

解决方案

Answering in parts:

  1. Setting JsonTextReader.ArrayPool as you are doing already (which is also shown in DemoTests.ArrayPooling()) should help minimize memory pressure due to allocation of intermediate character arrays during parsing. It will not, however, reduce memory use due to allocation of strings, which seems to be your complaint.

  2. As of Release 12.0.1, Json.NET has the ability to reuse instances of property name strings by setting JsonTextReader.PropertyNameTable to some appropriate JsonNameTable subclass.

    This mechanism is used during deserialization, by JsonSerializer.SetupReader(), to set a name table on the reader that returns the property names stored by the contract resolver, thus preventing repeated allocation of known property names expected by the serializer.

    You, however, are not using a serializer, you are reading directly, and so are not taking advantage of this mechanism. To enable it, you could create your own custom JsonNameTable to cache the property names you actually encounter:

    public class AutomaticJsonNameTable : DefaultJsonNameTable
    {
        int nAutoAdded = 0;
        int maxToAutoAdd;
    
        public AutomaticJsonNameTable(int maxToAdd)
        {
            this.maxToAutoAdd = maxToAdd;
        }
    
        public override string Get(char[] key, int start, int length)
        {
            var s = base.Get(key, start, length);
    
            if (s == null && nAutoAdded < maxToAutoAdd)
            {
                s = new string(key, start, length);
                Add(s);
                nAutoAdded++;
            }
    
            return s;
        }
    }
    

    And then use it as follows:

    const int MaxPropertyNamesToCache = 200; // Set through experiment.
    
    var nameTable = new AutomaticJsonNameTable(MaxPropertyNamesToCache);
    
    using (var sr = new MockedStreamReader())
    using (var jtr = new JsonTextReader(sr) { PropertyNameTable = nameTable })
    {
        // Process as before.
    }
    

    This should substantially reduce memory pressure due to property names.

    Note that AutomaticJsonNameTable will only auto-cache a specified, finite number of names to prevent memory allocation attacks. You'll need to determine this maximum number though experimentation. You could also manually hardcode the addition of expected, known property names.

    Note also that, by manually specifying a name table, you prevent use of the serializer-specified name table during deserialization. If your parsing algorithm involves reading through the file to locate specific nested objects, then deserializing those objects, you might get better performance by temporarily nulling out the name table before deserialization, e.g. with the following extension method:

    public static class JsonSerializerExtensions
    {
        public static T DeserializeWithDefaultNameTable<T>(this JsonSerializer serializer, JsonReader reader)
        {
            JsonNameTable old = null;
            var textReader = reader as JsonTextReader;
            if (textReader != null)
            {
                old = textReader.PropertyNameTable;
                textReader.PropertyNameTable = null;
            }
            try
            {
                return serializer.Deserialize<T>(reader);
            }
            finally
            {
                if (textReader != null)
                    textReader.PropertyNameTable = old;
            }
        }
    }
    

    It would need to be determined by experimentation whether using the serializer's name table gives better performance than your own (and I have not done any such experiment as part of writing this answer).

  3. There is currently no way to prevent JsonTextReader from allocating strings for property values even when skipping or otherwise ignoring those values. See please should support real skipping (no materialization of properties/etc) #1021 for a similar enhancement request.

    Your only option here would appear to be to fork your own version of JsonTextReader and add this capability yourself. You'd need to find all calls to SetToken(JsonToken.String, _stringReference.ToString(), ...) and replace the call to __stringReference.ToString() with something that doesn't unconditionally allocate memory.

    For instance, if you have a large chunk of JSON you would like to skip though, you could add a string DummyValue to JsonTextReader:

    public partial class MyJsonTextReader : JsonReader, IJsonLineInfo
    {
        public string DummyValue { get; set; }
    

    And then add the following logic where required (in two places currently):

    string text = DummyValue ?? _stringReference.ToString();
    SetToken(JsonToken.String, text, false);
    

    Or

    SetToken(JsonToken.String,  DummyValue ?? _stringReference.ToString(), false); 
    

    Then, when reading value(s) you know can be skipped, you would set MyJsonTextReader.DummyValue to some stub, say "dummy value".

    Alternatively, if you have many non-skippable repeated property values that you can predict in advance, you could create a second JsonNameTable StringValueNameTable and, when non-null, try looking up the StringReference in it like so:

    var text = StringValueNameTable?.Get(_stringReference.Chars, _stringReference.StartIndex, _stringReference.Length) ?? _stringReference.ToString();
    

    Unfortunately, forking your own JsonTextReader may require substantial ongoing maintenance, since you will also need to fork any and all Newtonsoft utilities used by the reader (there are many) and update them to any breaking changes in the original library.

    You could also vote up or comment on enhancement request #1021 requesting this ability, or add a similar request yourself.

这篇关于Newtonsoft json.net JsonTextReader垃圾收集器密集型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆