使用protobuf-net序列化大型合成图的列表,导致内存不足异常 [英] serialize list of huge composite graphs using protobuf-net causing out-of-memory-exception

查看:96
本文介绍了使用protobuf-net序列化大型合成图的列表,导致内存不足异常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Protobuf-net序列化一个包含非常大的复合对象图(约200000个节点或更多)的列表的对象.基本上,我要实现的是将完整的对象尽快并紧凑地保存到单个文件中.

I am trying to serialize an object containing a list of very large composite object graphs (~200000 nodes or more) using Protobuf-net. Basically what I want to achieve is to save the complete object into a single file as fast and as compact as possible.

我的问题是,在尝试序列化对象时出现内存不足异常.在我的机器上,文件大小约为1.5GB时会引发异常.我正在运行64位进程,并使用StreamWriter作为protobuf-net的输入.由于我直接写入文件,因此我怀疑protobuf-net内正在发生某种缓冲,从而导致异常.我尝试使用DataFormat = DataFormat.Group属性,但到目前为止还没有运气.

My problem is that I get an out-of-memory-exception while trying to serialize the object. On my machine the exception is thrown when the file size is around 1.5GB. I am running a 64 bit process and using a StreamWriter as input to protobuf-net. Since I am writing directly to a file I suspect that some kind of buffering is taking place within protobuf-net causing the exception. I have tried to use the DataFormat = DataFormat.Group attribute but with no luck so far.

我可以通过将列表中的每个组合序列化到一个单独的文件来避免出现异常,但是我希望一次完成所有操作.

I can avoid the exception by serializing each composite in the list to a separate file but I would prefer to have it all done in one go if possible.

我做错了吗,还是根本无法实现我想要的?

Am I doing something wrong or is it simply not possible to achieve what i want?

说明问题的代码:

class Program
{
    static void Main(string[] args)
    {
        int numberOfTrees = 250;
        int nodesPrTree = 200000;

        var trees = CreateTrees(numberOfTrees, nodesPrTree);
        var forest = new Forest(trees);

        using (var writer = new StreamWriter("model.bin"))
        {
            Serializer.Serialize(writer.BaseStream, forest);
        }

        Console.ReadLine();
    }

    private static Tree[] CreateTrees(int numberOfTrees, int nodesPrTree)
    {
        var trees = new Tree[numberOfTrees];
        for (int i = 0; i < numberOfTrees; i++)
        {
            var root = new Node();
            CreateTree(root, nodesPrTree, 0);
            var binTree = new Tree(root);
            trees[i] = binTree;
        }
        return trees;
    }

    private static void CreateTree(INode tree, int nodesPrTree, int currentNumberOfNodes)
    {
        Queue<INode> q = new Queue<INode>();
        q.Enqueue(tree);
        while (q.Count > 0 && currentNumberOfNodes < nodesPrTree)
        {
            var n = q.Dequeue();
            n.Left = new Node();
            q.Enqueue(n.Left);
            currentNumberOfNodes++;

            n.Right = new Node();
            q.Enqueue(n.Right);
            currentNumberOfNodes++;
        }
    }
}

[ProtoContract]
[ProtoInclude(1, typeof(Node), DataFormat = DataFormat.Group)]
public interface INode
{
    [ProtoMember(2, DataFormat = DataFormat.Group, AsReference = true)]
    INode Parent { get; set; }
    [ProtoMember(3, DataFormat = DataFormat.Group, AsReference = true)]
    INode Left { get; set; }
    [ProtoMember(4, DataFormat = DataFormat.Group, AsReference = true)]        
    INode Right { get; set; }
}

[ProtoContract]
public class Node : INode
{
    INode m_parent;
    INode m_left;
    INode m_right;

    public INode Left
    {
        get
        {
            return m_left;
        }
        set
        {
            m_left = value;
            m_left.Parent = null;
            m_left.Parent = this;
        }
    }

    public INode Right
    {
        get
        {
            return m_right;
        }
        set
        {
            m_right = value;
            m_right.Parent = null;
            m_right.Parent = this;
        }
    }

    public INode Parent
    {
        get
        {
            return m_parent;
        }
        set
        {
            m_parent = value;
        }
    }
}

[ProtoContract]
public class Tree
{
    [ProtoMember(1, DataFormat = DataFormat.Group)]
    public readonly INode Root;

    public Tree(INode root)
    {
        Root = root;
    }
}

[ProtoContract]
public class Forest
{
    [ProtoMember(1, DataFormat = DataFormat.Group)]
    public readonly Tree[] Trees;

    public Forest(Tree[] trees)
    {
        Trees = trees;
    }
}

引发异常时的堆栈跟踪:

Stack-trace when the exception is thrown:

at System.Collections.Generic.Dictionary`2.Resize(Int32 newSize, Boolean forceNewHashCodes)
at System.Collections.Generic.Dictionary`2.Insert(TKey key, TValue value, Boolean add)
at ProtoBuf.NetObjectCache.AddObjectKey(Object value, Boolean& existing) in NetObjectCache.cs:line 154
at ProtoBuf.BclHelpers.WriteNetObject(Object value, ProtoWriter dest, Int32 key, NetObjectOptions options) BclHelpers.cs:line 500
at proto_5(Object , ProtoWriter )

我正在尝试一种变通方法,其中我使用SerializeWithLengthPrefix方法一次将一个树阵列序列化为一个文件.序列化似乎可行-在列表中的每棵树添加到文件后,我可以看到文件大小有所增加.但是,当我尝试对树进行反序列化时,会收到Invalid wire-type异常.我在序列化树时正在创建一个新文件,因此该文件应无垃圾-除非我正在写原因;-)的垃圾.我的序列化和反序列化方法如下:

I am trying to do a workaround where I serialize the array of trees one at a time to a single file using the SerializeWithLengthPrefix method. Serialization seems work - I can see the filesize is increased after each tree in the list is added to the file. However, when I try to Deserialize the trees I get the Invalid wire-type exception. I am creating a new file when I serialize the trees so the file should be garbage free - unless I am writing garbage of cause ;-). My serialize and deserialization methods are listed below:

using (var writer = new FileStream("model.bin", FileMode.Create))
{
    foreach (var tree in trees)
    {
        Serializer.SerializeWithLengthPrefix(writer, tree, PrefixStyle.Base128);
    }
}

using (var reader = new FileStream("model.bin", FileMode.Open))
{
    var trees = Serializer.DeserializeWithLengthPrefix<Tree[]>>(reader, PrefixStyle.Base128);
}

我使用该方法的方式有误吗?

Am I using the method in a incorrect way?

推荐答案

代码仅遵循默认数据格式并不是有帮助,这意味着它试图保留内存中的数据,以便它可以将对象长度前缀写回到数据流中,这正是我们不想要想要的(因此您非常正确地使用了DataFormat.Group).这将说明缓冲树的单个分支.我已经在本地对其进行了调整,并且可以肯定地确认它现在仅在进行正向写入(调试版本具有方便的ForwardsOnly标志,可以启用它来检测并提示).

It wasn't helping that the AsReference code was only respecting default data-format, which means it was trying to hold data in memory so that it can write the object-length prefix back into the data-stream, which is exactly what we don't want here (hence your quite correct use of DataFormat.Group). That will account for buffering for an individual branch of the tree. I've tweaked it locally, and I can definitely confirm that it is now writing forwards-only (the debug build has a convenient ForwardsOnly flag that I can enable which detects this and shouts).

通过这种调整,我可以在250 x 20,000上工作,但是在250 x 200,000上工作时,字典调整大小(甚至在x64中)也遇到了第二个问题-就像您说的那样,在1.5GB左右等级.但是,在我进行序列化/反序列化时,我可能可以分别丢弃其中一个(正向或反向).当它为您中断时,我会对堆栈跟踪感兴趣-如果最终是调整字典大小,则可能需要考虑移至词典的 group 组...

With that tweak, I have had it work for 250 x 20,000, but I'm getting secondary problems with the dictionary resizing (even in x64) when working on the 250 x 200,000 - like you say, at around the 1.5GB level. It occurs to me, however, that I might be able to discard one of these (forwards or reverse) respectively when doing each of serialization / deserialization. I would be interested in the stack-trace when it breaks for you - if it is ultimately the dictionary resize, I may need to think about moving to a group of dictionaries...

这篇关于使用protobuf-net序列化大型合成图的列表,导致内存不足异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆