优化多维通用数组的二进制序列化 [英] Optimising binary serialization for multi-dimensional generic arrays

查看:76
本文介绍了优化多维通用数组的二进制序列化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个需要二进制序列化的类.该类包含一个字段,如下所示:

private T[,] m_data;

这些多维数组可以很大(成千上万个元素),并且可以是任何原始类型.当我在对象上尝试标准.net序列化时,写入磁盘的文件很大,我认为.net正在存储许多有关元素类型的重复数据,并且可能效率不高.

我到处寻找自定义序列化器,但是没有看到任何涉及多维通用数组的东西.在成功进行序列化之后,我还对内存流的字节数组进行了内置.net压缩的实验,但是取得了一些成功,但是没有我希望的那么快/压缩.

我的问题是,我应该尝试编写一个自定义序列化程序以最佳化序列化此数组以获得适当的类型(这似乎有些令人生畏),还是应该使用标准的.net序列化并添加压缩?

关于最佳方法的任何建议将不胜感激,或链接到显示如何解决多维通用数组序列化问题的资源的链接-如所述解决方案

这就是我想出的.下面的代码创建一个int [1000] [10000]并将其使用BinaryFormatter写入2个文件-一个压缩文件,另一个不压缩文件.

压缩文件为1.19 MB(1,255,339字节) 解压缩后的大小为38.2 MB(40,150,034字节)

        int width = 1000;
        int height = 10000;
        List<int[]> list = new List<int[]>();
        for (int i = 0; i < height; i++)
        {
            list.Add(Enumerable.Range(0, width).ToArray());
        }
        int[][] bazillionInts = list.ToArray();
        using (FileStream fsZ = new FileStream("c:\\temp_zipped.txt", FileMode.Create))
        using (FileStream fs = new FileStream("c:\\temp_notZipped.txt", FileMode.Create))
        using (GZipStream gz = new GZipStream(fsZ, CompressionMode.Compress))
        {
            BinaryFormatter f = new BinaryFormatter();
            f.Serialize(gz, bazillionInts);
            f.Serialize(fs, bazillionInts);
        }

我想不出更好/简单的方法来做到这一点.压缩版本非常糟糕.

我将使用BinaryFormatter + GZipStream.定制某些东西根本不会很有趣.


[由MG编辑] 希望您不会因编辑而得罪,但统一重复的Range(0,width)会使事情大为偏斜;更改为:

        int width = 1000;
        int height = 10000;
        Random rand = new Random(123456);
        int[,] bazillionInts = new int[width, height];
        for(int i = 0 ; i < width;i++)
            for (int j = 0; j < height; j++)
            {
                bazillionInts[i, j] = rand.Next(50000);
            }

尝试一下;您会在40MB处看到temp_notZipped.txt,在62MB处看到temp_zipped.txt.不太吸引人...

I have a class that I need to binary serialize. The class contains one field as below:

private T[,] m_data;

These multi-dimensional arrays can be fairly large (hundreds of thousands of elements) and of any primitive type. When I tried standard .net serialization on an object the file written to disk was large and I think .net is storing a lot of repeated data about element types and possibly not as efficiently as could be done.

I have looked around for custom serializers but have not seen any that deal with multi-dimensional generic arrays. I have also experimented with built-in .net compression on a byte array of the memory stream following serializing with some success, but not as quick / compressed as I had hoped.

My question is, should I try and write a custom serializer to optimally serialize this array for the appropriate type (this seems a little daunting), or should I use standard .net serialization and add compression?

Any advice on the best approach would be most appreciated, or links to resources showing how to tackle serialization of a multi-dimensional generic array - as mentioned existing examples I have found do not support such structures.

解决方案

Here's what I came up with. The code below makes an int[1000][10000] and writes it out using the BinaryFormatter to 2 files - one zipped and one not.

The zipped file is 1.19 MB (1,255,339 bytes) Unzipped is 38.2 MB (40,150,034 bytes)

        int width = 1000;
        int height = 10000;
        List<int[]> list = new List<int[]>();
        for (int i = 0; i < height; i++)
        {
            list.Add(Enumerable.Range(0, width).ToArray());
        }
        int[][] bazillionInts = list.ToArray();
        using (FileStream fsZ = new FileStream("c:\\temp_zipped.txt", FileMode.Create))
        using (FileStream fs = new FileStream("c:\\temp_notZipped.txt", FileMode.Create))
        using (GZipStream gz = new GZipStream(fsZ, CompressionMode.Compress))
        {
            BinaryFormatter f = new BinaryFormatter();
            f.Serialize(gz, bazillionInts);
            f.Serialize(fs, bazillionInts);
        }

I can't think of a better/easy way to do this. The zipped version is pretty damn tight.

I'd go with the BinaryFormatter + GZipStream. Making something custom would not be fun at all.


[edit by MG] I hope you won't be offended by an edit, but the uniform repeated Range(0,width) is skewing things vastly; change to:

        int width = 1000;
        int height = 10000;
        Random rand = new Random(123456);
        int[,] bazillionInts = new int[width, height];
        for(int i = 0 ; i < width;i++)
            for (int j = 0; j < height; j++)
            {
                bazillionInts[i, j] = rand.Next(50000);
            }

And try it; you'll see temp_notZipped.txt at 40MB, temp_zipped.txt at 62MB. Not so appealing...

这篇关于优化多维通用数组的二进制序列化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆