Json.Net中Streams和BsonWriter的OutOfMemory异常 [英] OutOfMemory Exception with Streams and BsonWriter in Json.Net

查看:110
本文介绍了Json.Net中Streams和BsonWriter的OutOfMemory异常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在使用Json.net并创建大型Bson文件时遇到问题.我有以下测试代码:

I'm having a problem using Json.net and creating a large Bson file. I have the following test code:

Imports System.IO
Imports Newtonsoft.Json

Public Class Region
    Public Property Id As Integer
    Public Property Name As String
    Public Property FDS_Id As String
End Class

Public Class Regions
    Inherits List(Of Region)

    Public Sub New(capacity As Integer)
        MyBase.New(capacity)
    End Sub
End Class

Module Module1
    Sub Main()
        Dim writeElapsed2 = CreateFileBson_Stream(GetRegionList(5000000))
        GC.Collect(0)
    End Sub

    Public Function GetRegionList(count As Integer) As List(Of Region)
        Dim regions As New Regions(count - 1)
        For lp = 0 To count - 1
            regions.Add(New Region With {.Id = lp, .Name = lp.ToString, .FDS_Id = lp.ToString})
        Next
        Return regions
    End Function

    Public Function CreateFileBson_Stream(regions As Regions) As Long
        Dim sw As New Stopwatch
        sw.Start()
        Dim lp = 0

        Using stream = New StreamWriter("c:\atlas\regionsStream.bson")
            Using writer = New Bson.BsonWriter(stream.BaseStream)
                writer.WriteStartArray()

                For Each item In regions
                    writer.WriteStartObject()
                    writer.WritePropertyName("Id")
                    writer.WriteValue(item.Id)
                    writer.WritePropertyName("Name")
                    writer.WriteValue(item.Name)
                    writer.WritePropertyName("FDS_Id")
                    writer.WriteValue(item.FDS_Id)
                    writer.WriteEndObject()

                    lp += 1
                    If lp Mod 1000000 = 0 Then
                        writer.Flush()
                        stream.Flush()
                        stream.BaseStream.Flush()
                    End If
                Next

                writer.WriteEndArray()
            End Using
        End Using

        sw.Stop()
        Return sw.ElapsedMilliseconds
    End Function
End Module

我在第一个using语句中使用了FileStream而不是StreamWriter,这没什么区别.

I have used FileStream instead of StreamWriter in the first using statement and it makes no difference.

CreateBsonFile_Stream仅在3m条记录上失败,并出现OutOfMemory异常.在Visual Studio中使用内存事件探查器显示,即使我正在清除所有可能的内存,内存仍在继续攀升.

The CreateBsonFile_Stream fails at just over 3m records with an OutOfMemory exception. Using the memory profiler in visual studio shows the memory continuing to climb even though I'm flushing everything I can.

5m区域列表的内存约为468Mb.

The list of 5m regions comes to about 468Mb in memory.

有趣的是,如果我使用以下代码来生成Json,它可以工作并且内存状态稳定在500Mb:

Interestingly, if I use the following code to produce Json it works and memory statys steady at 500Mb:

Public Function CreateFileJson_Stream(regions As Regions) As Long
        Dim sw As New Stopwatch
        sw.Start()
        Using stream = New StreamWriter("c:\atlas\regionsStream.json")
            Using writer = New JsonTextWriter(stream)
                writer.WriteStartArray()

                For Each item In regions
                    writer.WriteStartObject()
                    writer.WritePropertyName("Id")
                    writer.WriteValue(item.Id)
                    writer.WritePropertyName("Name")
                    writer.WriteValue(item.Name)
                    writer.WritePropertyName("FDS_Id")
                    writer.WriteValue(item.FDS_Id)
                    writer.WriteEndObject()
                Next

                writer.WriteEndArray()
            End Using
        End Using
        sw.Stop()
        Return sw.ElapsedMilliseconds
    End Function

我可以肯定这是BsonWriter的问题,但是看不到我还能做什么.有什么想法吗?

I'm pretty certain this is a problem with the BsonWriter but can't see what else I can do. Any ideas?

推荐答案

内存不足的原因如下.根据 BSON规范,每个对象或数组-在标准-在开头必须包含 构成文档的字节总数:

The reason you are running out of memory is as follows. According to the BSON specification, every object or array - called documents in the standard - must contain at the beginning a count of the total number of bytes comprising the document:

document    ::=     int32 e_list "\x00"     BSON Document. int32 is the total number of bytes comprising the document.
e_list      ::=     element e_list  
    |   ""  
element     ::=     "\x01" e_name double    64-bit binary floating point
    |   "\x02" e_name string    UTF-8 string
    |   "\x03" e_name document  Embedded document
    |   "\x04" e_name document  Array
    |   ...

因此,在写入根对象或数组时,必须预先计算.要写入文件的字节总数.

Thus when writing the root object or array, the total number of bytes to be written to the file must be precalculated.

Newtonsoft的 和基础 BsonBinaryWriter 通过缓存所有要写在树中的令牌,然后在确定根令牌的内容之后,在写出树之前递归计算大小. (替代方法是使应用程序(即您的代码)以某种方式预先计算该信息(实际上是不可能的),或者在输出流中来回搜索以编写此信息,可能仅针对那些 Stream.CanSeek == true . )由于系统资源不足,无法容纳令牌树,因此出现OutOfMemory异常.

Newtonsoft's BsonDataWriter and underlying BsonBinaryWriter implement this by caching all tokens to be written in a tree, then when the contents of the root token have been finalized, recursively calculating the sizes before writing the tree out. (The alternatives would have been to make the application (i.e. your code) somehow precalculate this information -- practically impossible -- or to seek back and forth in the output stream to write this information, possibly only for those streams for which Stream.CanSeek == true.) You are getting the OutOfMemory exception because your system has insufficient resources to hold the token tree.

为进行比较, JSON标准不需要字节数或大小写在文件中的任何位置.因此,JsonTextWriter可以立即流式传输序列化的数组内容,而无需缓存任何内容.

For comparison, the JSON standard does not require byte counts or sizes to be written anywhere in the file. Thus JsonTextWriter can stream your serialized array contents out immediately, without the need to cache anything.

一种解决方法,基于 BSON规范 BsonBinaryWriter 我已经创建了一种帮助程序方法,该方法将可枚举的对象递增序列化为

As a workaround, based on the BSON spec and BsonBinaryWriter I have created a helper method that incrementally serializes an enumerable to a stream for which Stream.CanSeek == true. It doesn't require caching the entire BSON document in memory, but rather seeks to the beginning of the stream to write the final byte count:

public static partial class BsonExtensions
{
    const int BufferSize = 256;

    public static void SerializeEnumerable<TItem>(IEnumerable<TItem> enumerable, Stream stream, JsonSerializerSettings settings = null)
    {
        // Created based on https://github.com/JamesNK/Newtonsoft.Json/blob/master/Src/Newtonsoft.Json/Bson/BsonBinaryWriter.cs
        // And http://bsonspec.org/spec.html
        if (enumerable == null || stream == null)
            throw new ArgumentNullException();
        if (!stream.CanSeek || !stream.CanWrite)
            throw new ArgumentException("!stream.CanSeek || !stream.CanWrite");

        var serializer = JsonSerializer.CreateDefault(settings);
        var contract = serializer.ContractResolver.ResolveContract(typeof(TItem));
        BsonType rootType;
        if (contract is JsonObjectContract || contract is JsonDictionaryContract)
            rootType = BsonType.Object;
        else if (contract is JsonArrayContract)
            rootType = BsonType.Array;
        else
            // Arrays of primitives are not implemented yet.
            throw new JsonSerializationException(string.Format("Item type \"{0}\" not implemented.", typeof(TItem)));

        stream.Flush(); // Just in case.
        var initialPosition = stream.Position;

        var buffer = new byte[BufferSize];

        WriteInt(stream, (int)0, buffer); // CALCULATED SIZE TO BE CALCULATED LATER.

        ulong index = 0;
        foreach (var item in enumerable)
        {
            if (item == null)
            {
                stream.WriteByte(unchecked((byte)BsonType.Null));
                WriteString(stream, index.ToString(NumberFormatInfo.InvariantInfo), buffer);
            }
            else
            {
                stream.WriteByte(unchecked((byte)rootType));
                WriteString(stream, index.ToString(NumberFormatInfo.InvariantInfo), buffer);
                using (var bsonWriter = new BsonDataWriter(stream) { CloseOutput = false })
                {
                    serializer.Serialize(bsonWriter, item);
                }
            }
            index++;
        }

        stream.WriteByte((byte)0);
        stream.Flush();

        var finalPosition = stream.Position;
        stream.Position = initialPosition;

        var size = checked((int)(finalPosition - initialPosition));
        WriteInt(stream, size, buffer); // CALCULATED SIZE.

        stream.Position = finalPosition;
    }

    private static readonly Encoding Encoding = new UTF8Encoding(false);

    private static void WriteString(Stream stream, string s, byte[] buffer)
    {
        if (s != null)
        {
            if (s.Length < buffer.Length / Encoding.GetMaxByteCount(1))
            {
                var byteCount = Encoding.GetBytes(s, 0, s.Length, buffer, 0);
                stream.Write(buffer, 0, byteCount);
            }
            else
            {
                byte[] bytes = Encoding.GetBytes(s);
                stream.Write(bytes, 0, bytes.Length);
            }
        }

        stream.WriteByte((byte)0);
    }

    private static void WriteInt(Stream stream, int value, byte[] buffer)
    {
        unchecked
        {
            buffer[0] = (byte)value;
            buffer[1] = (byte)(value >> 8);
            buffer[2] = (byte)(value >> 16);
            buffer[3] = (byte)(value >> 24);
        }
        stream.Write(buffer, 0, 4);
    }

    enum BsonType : sbyte
    {
        // Taken from https://github.com/JamesNK/Newtonsoft.Json/blob/master/Src/Newtonsoft.Json/Bson/BsonType.cs
        // And also http://bsonspec.org/spec.html
        Number = 1,
        String = 2,
        Object = 3,
        Array = 4,
        Binary = 5,
        Undefined = 6,
        Oid = 7,
        Boolean = 8,
        Date = 9,
        Null = 10,
        Regex = 11,
        Reference = 12,
        Code = 13,
        Symbol = 14,
        CodeWScope = 15,
        Integer = 16,
        TimeStamp = 17,
        Long = 18,
        MinKey = -1,
        MaxKey = 127
    }
}

然后按如下方式调用它:

And then call it as follows:

BsonExtensions.SerializeEnumerable(regions, stream)

注意:

  • 您可以使用上面的方法来序列化为本地FileStreamMemoryStream,但是不能,例如不能重新定位的DeflateStream.

  • You could use the method above to serialize to a local FileStream or a MemoryStream -- but not, say, a DeflateStream, which cannot be repositioned.

未实现对基元的可枚举序列化,但可以实现.

Serializing enumerables of primitives is not implemented, but could be.

在版本 10.0.1 Newtonsoft将BSON处理移至单独的nuget Newtonsoft.Json.Bson 并替换了 BsonWriter BsonDataWriter .如果您使用的是Newtonsoft的早期版本,则以上答案同样适用于旧的BsonWriter.

In Release 10.0.1 Newtonsoft moved BSON processing into a separate nuget Newtonsoft.Json.Bson and replaced BsonWriter with BsonDataWriter. If you are using an earlier version of Newtonsoft the answer above applies equally to the old BsonWriter.

由于Json.NET是用c#编写的,而我的主要语言是c#,因此解决方法也是用c#编写的.如果您需要将其转换为VB.NET,请告诉我,我可以尝试.

Since Json.NET is written in c# and my primary language is c#, the workaround is also in c#. If you need this converted to VB.NET, let me know and I can try.

演示一些简单的单元测试的小提琴此处.

Demo fiddle with some simple unit tests here.

这篇关于Json.Net中Streams和BsonWriter的OutOfMemory异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆