改善结构的大名单二进制序列化性能 [英] Improve Binary Serialization Performance for large List of structs

查看:137
本文介绍了改善结构的大名单二进制序列化性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个结构持有三维坐标中的3个整数。在测试中,我已经把一个List<> 100万随机点,然后用二进制序列化到内存流

I have a structure holding 3d co-ordinates in 3 ints. In a test I've put together a List<> of 1 million random points and then used Binary serialization to a memory stream.

存储流进来了〜21 MB的 - 这似乎是非常低效1000000点* 3 * COORDS 4个字节应该站出来为11MB最低

The memory stream is coming in a ~ 21 MB - which seems very inefficient as 1000000 points * 3 coords * 4 bytes should come out at 11MB minimum

它也正在对我的试验台〜3秒。

Its also taking ~ 3 seconds on my test rig.

任何想法,以改善性能和/或大小?

Any ideas for improving performance and/or size?

(我没有保持ISerialzable接口,如果有帮助,我可以写出来,直接到内存流)

(I don't have to keep the ISerialzable interface if it helps, I could write out directly to a memory stream)

修改 - 从下面的答案,我已经把比较的BinaryFormatter序列化摊牌,原始的BinaryWriter和protobuf的

EDIT - From answers below I've put together a serialization showdown comparing BinaryFormatter, 'Raw' BinaryWriter and Protobuf

using System;
using System.Text;
using System.Collections.Generic;
using System.Linq;
using Microsoft.VisualStudio.TestTools.UnitTesting;
using System.Runtime.Serialization;
using System.Runtime.Serialization.Formatters.Binary;
using System.IO;
using ProtoBuf;

namespace asp_heatmap.test
{
    [Serializable()] // For .NET BinaryFormatter
    [ProtoContract] // For Protobuf
    public class Coordinates : ISerializable
    {
        [Serializable()]
        [ProtoContract]
        public struct CoOrd
        {
            public CoOrd(int x, int y, int z)
            {
                this.x = x;
                this.y = y;
                this.z = z;
            }
            [ProtoMember(1)]            
            public int x;
            [ProtoMember(2)]
            public int y;
            [ProtoMember(3)]
            public int z;
        }

        internal Coordinates()
        {
        }

        [ProtoMember(1)]
        public List<CoOrd> Coords = new List<CoOrd>();

        public void SetupTestArray()
        {
            Random r = new Random();
            List<CoOrd> coordinates = new List<CoOrd>();
            for (int i = 0; i < 1000000; i++)
            {
                Coords.Add(new CoOrd(r.Next(), r.Next(), r.Next()));
            }
        }

        #region Using Framework Binary Formatter Serialization

        void ISerializable.GetObjectData(SerializationInfo info, StreamingContext context)
        {
            info.AddValue("Coords", this.Coords);
        }

        internal Coordinates(SerializationInfo info, StreamingContext context)
        {
            this.Coords = (List<CoOrd>)info.GetValue("Coords", typeof(List<CoOrd>));
        }

        #endregion

        # region 'Raw' Binary Writer serialization

        public MemoryStream RawSerializeToStream()
        {
            MemoryStream stream = new MemoryStream(Coords.Count * 3 * 4 + 4);
            BinaryWriter writer = new BinaryWriter(stream);
            writer.Write(Coords.Count);
            foreach (CoOrd point in Coords)
            {
                writer.Write(point.x);
                writer.Write(point.y);
                writer.Write(point.z);
            }
            return stream;
        }

        public Coordinates(MemoryStream stream)
        {
            using (BinaryReader reader = new BinaryReader(stream))
            {
                int count = reader.ReadInt32();
                Coords = new List<CoOrd>(count);
                for (int i = 0; i < count; i++)                
                {
                    Coords.Add(new CoOrd(reader.ReadInt32(),reader.ReadInt32(),reader.ReadInt32()));
                }
            }        
        }
        #endregion
    }

    [TestClass]
    public class SerializationTest
    {
        [TestMethod]
        public void TestBinaryFormatter()
        {
            Coordinates c = new Coordinates();
            c.SetupTestArray();

            // Serialize to memory stream
            MemoryStream mStream = new MemoryStream();
            BinaryFormatter bformatter = new BinaryFormatter();
            bformatter.Serialize(mStream, c);
            Console.WriteLine("Length : {0}", mStream.Length);

            // Now Deserialize
            mStream.Position = 0;
            Coordinates c2 = (Coordinates)bformatter.Deserialize(mStream);
            Console.Write(c2.Coords.Count);

            mStream.Close();
        }

        [TestMethod]
        public void TestBinaryWriter()
        {
            Coordinates c = new Coordinates();
            c.SetupTestArray();

            MemoryStream mStream = c.RawSerializeToStream();
            Console.WriteLine("Length : {0}", mStream.Length);

            // Now Deserialize
            mStream.Position = 0;
            Coordinates c2 = new Coordinates(mStream);
            Console.Write(c2.Coords.Count);
        }

        [TestMethod]
        public void TestProtoBufV2()
        {
            Coordinates c = new Coordinates();
            c.SetupTestArray();

            MemoryStream mStream = new MemoryStream();
            ProtoBuf.Serializer.Serialize(mStream,c);
            Console.WriteLine("Length : {0}", mStream.Length);

            mStream.Position = 0;
            Coordinates c2 = ProtoBuf.Serializer.Deserialize<Coordinates>(mStream);
            Console.Write(c2.Coords.Count);
        }
    }
}

结果的(注PB v2.0.0.423测试版)

                Serialize | Ser + Deserialize    | Size
-----------------------------------------------------------          
BinaryFormatter    2.89s  |      26.00s !!!      | 21.0 MB
ProtoBuf v2        0.52s  |       0.83s          | 18.7 MB
Raw BinaryWriter   0.27s  |       0.36s          | 11.4 MB

显然,这只是在看速度/规模,不考虑其他任何东西。

Obviously this is just looking at speed/size and doesn't take into account anything else.

推荐答案

使用二进制序列的BinaryFormatter 包括在其生成的字节类型信息。这占用了额外的空间。这是在情况下,你不知道会发生在另一端是什么结构的数据非常有用,例如。

Binary serialisation using BinaryFormatter includes type information in the bytes it generates. This takes up additional space. It's useful in cases where you don't know what structure of data to expect at the other end, for example.

在你的情况,你知道数据有什么格式的两端,而且听起来并不像它会改变。所以,你可以写一个简单的连接code和德code方法。你的坐标类不再需要被序列化了。

In your case, you know what format the data has at both ends, and that doesn't sound like it'd change. So you can write a simple encode and decode method. Your CoOrd class no longer needs to be serializable too.

我会用System.IO.BinaryReader和 System.IO .BinaryWriter 时,然后通过每个您COORD实例环路和读/写在X,Y,Z属性格式值到流。这些类甚至会收拾你的整数为小于11MB,假设你的很多数字比0x7F的和0x7FFF的更小。

I would use System.IO.BinaryReader and System.IO.BinaryWriter, then loop through each of your CoOrd instances and read/write the X,Y,Z propery values to the stream. Those classes will even pack your ints into less than 11MB, assuming many of your numbers are smaller than 0x7F and 0x7FFF.

事情是这样的:

using (var writer = new BinaryWriter(stream)) {
    // write the number of items so we know how many to read out
    writer.Write(points.Count);
    // write three ints per point
    foreach (var point in points) {
        writer.Write(point.X);
        writer.Write(point.Y);
        writer.Write(point.Z);
    }
}

从流中读取:

To read from the stream:

List<CoOrd> points;
using (var reader = new BinaryReader(stream)) {
    var count = reader.ReadInt32();
    points = new List<CoOrd>(count);
    for (int i = 0; i < count; i++) {
        var x = reader.ReadInt32();
        var y = reader.ReadInt32();
        var z = reader.ReadInt32();
        points.Add(new CoOrd(x, y, z));
    }
}

这篇关于改善结构的大名单二进制序列化性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆