在.NET中使用BinaryFormatter序列化多维数组时的性能问题 [英] Performance issue when serializing multi-dimensional arrays using BinaryFormatter in .NET

查看:77
本文介绍了在.NET中使用BinaryFormatter序列化多维数组时的性能问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用BinaryFormatter来序列化一个相当简单的多维浮点数组,尽管我怀疑任何原始类型都会出现此问题.我的多维阵列包含10000x16浮点数(160k),并且在我的PC上进行序列化的速度为〜8 MB/s(60秒基准测试,向SSD驱动器写入了〜500 MB).代码:

        Stopwatch stopwatch = new Stopwatch();

        float[,] data = new float[10000 , 16];  // Two-dimensional array of 160,000 floats.
        // OR
        float[]  data = new float[10000 * 16];  // One-dimensional array of 160,000 floats.

        var formatter = new BinaryFormatter();
        var stream = new FileStream("C:\\Temp\\test_serialization.data", FileMode.Create, FileAccess.Write);

        // Serialize to disk the array 1000 times.
        stopwatch.Reset();
        stopwatch.Start();
        for (int i = 0; i < 1000; i++)
        {
            formatter.Serialize(stream, data);
        }
        stream.Close();
        stopwatch.Stop();

        TimeSpan ts = stopwatch.Elapsed;

        // Format and display the TimeSpan value.
        string elapsedTime = String.Format("{0:00}:{1:00}:{2:00}.{3:000}",
            ts.Hours, ts.Minutes, ts.Seconds,
            ts.Milliseconds);
        Console.WriteLine("Runtime " + elapsedTime);
        var info = new FileInfo(stream.Name);
        Console.WriteLine("Speed: {0:0.00} MB/s", info.Length / ts.TotalSeconds / 1024.0 / 1024.0);

做同样的事情,但是使用160k浮点数的一维数组,将相同数量的数据以〜179 MB/s的速度序列化到磁盘.快20倍! 为什么使用BinaryFormatter序列化二维数组的效果如此差??两个数组在内存中的底层存储应该相同. (我已经完成了不安全的本机pin_ptr并在C ++/CLI中与2D数组进行复制).

一个棘手的解决方案是实现ISerializable并将2D数组进行内存复制(不安全/ptr固定/块内存复制),并将其序列化为1D数组和尺寸.我正在考虑的另一种选择是切换到protobuf-net.

解决方案

无需放弃数据结构或复制值,可以使用以下代码达到相同的性能:

            fixed (float* ptr = data)
            {
                byte* arr = (byte*)ptr;
                int size = sizeof(float);

                for (int j = 0; j < data.Length * size; j++)
                {
                    stream.WriteByte(arr[j]);
                }
            }

基本上,您是在自己编写输出流,就像您说的那样,由于内存结构相同,因此只是将float []用作byte [].

反序列化是相同的,您可以使用StreamReader读取浮点数或不安全的内容,只需将数据加载到内存中即可.

如果您有这样的基本需求,我强烈建议您不要使用protobuf.net.开发速度变慢,并且仅靠一个人来完成,因此这是非常冒险的(当我尝试解决性能问题时,他甚至都没有看到我提出的更改的烦恼). 但是,如果要序列化复杂的数据结构,二进制序列化不会比protobuf慢很多,尽管.NET平台上并没有正式支持后者.(Google发布了Java,Python和C ++的代码)./p>

I'm using the BinaryFormatter to serialize a fairly simple multi-dimentional array of floats, although I suspect that the problem occurs with any primitive types. My multi-dimensional array contains 10000x16 floats (160k) and serializing on my PC runs at ~8 MB/s (60 second benchmark writing ~500 MB to SSD drive). Code:

        Stopwatch stopwatch = new Stopwatch();

        float[,] data = new float[10000 , 16];  // Two-dimensional array of 160,000 floats.
        // OR
        float[]  data = new float[10000 * 16];  // One-dimensional array of 160,000 floats.

        var formatter = new BinaryFormatter();
        var stream = new FileStream("C:\\Temp\\test_serialization.data", FileMode.Create, FileAccess.Write);

        // Serialize to disk the array 1000 times.
        stopwatch.Reset();
        stopwatch.Start();
        for (int i = 0; i < 1000; i++)
        {
            formatter.Serialize(stream, data);
        }
        stream.Close();
        stopwatch.Stop();

        TimeSpan ts = stopwatch.Elapsed;

        // Format and display the TimeSpan value.
        string elapsedTime = String.Format("{0:00}:{1:00}:{2:00}.{3:000}",
            ts.Hours, ts.Minutes, ts.Seconds,
            ts.Milliseconds);
        Console.WriteLine("Runtime " + elapsedTime);
        var info = new FileInfo(stream.Name);
        Console.WriteLine("Speed: {0:0.00} MB/s", info.Length / ts.TotalSeconds / 1024.0 / 1024.0);

Doing the same thing but using a one-dimensional array of 160k floats, the same amount of data is serialized to disk at ~179 MB/s. Over 20x faster! Why does serializing a two-dimensional array using BinaryFormatter perform so poorly? The underlying storage of the of the two arrays in memory should be identical. (I've done unsafe native pin_ptr and copying to and from 2D arrays in C++/CLI).

A hackish solution would be to implement ISerializable and do a memcopy (unsafe/ptr pinning/block memcopy) the 2D array into a 1D array and serialize that and the dimensions. Another option I am considering is a switch to protobuf-net.

解决方案

No need to give up your data structure or copy values, you can use the following code to achieve to same performance:

            fixed (float* ptr = data)
            {
                byte* arr = (byte*)ptr;
                int size = sizeof(float);

                for (int j = 0; j < data.Length * size; j++)
                {
                    stream.WriteByte(arr[j]);
                }
            }

Basically, you are writing the output stream yourself, and like you said, you are just using the float[] as a byte[] since the memory structure is the same.

The deseriazliation is the same, you can use either StreamReader to read floats or unsafe and just load the data into memory.

If you have basic needs like this, I'd strongly discourage using protobuf.net though. The development slowed down and based on one single guy, so it's pretty risky (when I tried to help about a performance issue, he did not even bother to see the changes I offered to make). However, if you want to serialise complex data structures, binary serialisation would not be much slower than protobuf, although the latter one is not officially supported on the .NET platform (Google released the code for it for Java, Python and C++).

这篇关于在.NET中使用BinaryFormatter序列化多维数组时的性能问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆