C#-如何将字节值保存到可能的最小大小的文件中? [英] C# - How to save byte values to file with smallest size possible?

查看:71
本文介绍了C#-如何将字节值保存到可能的最小大小的文件中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要以尽可能小的文件大小来序列化以下数据.

我有一组模式,每个模式都是一个设定长度的字节数组( byte [] ).

在此示例中,我们使用模式长度5,因此字节数组将为:

  var pattern = new byte [] {1,2,3,4,5}; 

假设我们在一个集合中有3个相同的模式:

  var collection = new byte [] [] {pattern,pattern,pattern}; 

当前,我将集合保存在ASCII编码文件中.使用上面的集合,保存的文件将如下所示:

 <代码> 010203040501020304050102030405 

数组中的每个字节都由2位数字(00)表示,这样我就可以满足从0到25的字节值,可以像这样可视化:

[01 | 02 | 03 | 04 | 05] [01 | 02 | 03 | 04 | 05] [01 | 02 | 03 | 04 | 05]

反序列化文件时,我将每个2个字符的块解析为一个字节,并将每5个字节放入一个字节数组中.

据我了解,ASCII编码文件中的每个字符都是一个字节-可能提供256个不同的值,但是我需要的是每个2个字符的块都应是从0到25的可能的十进制值./p>

当我保存一个具有50,000个样式的文件时,每个样式的长度为12,我最终得到一个1.7MB的文件,这太大了.

我可以在C#中使用哪种编码来使文件大小更小?

请提供示例代码,该示例代码说明如何在文件中写入和读取数据.

解决方案

在将二进制数据编码为条形码时,我做了类似的操作(请参见

与您的格式相比,示例数据将序列化为27个字节而不是90个字节.使用@xanatos的每个符号4.7位,完美的结果将是 14 * 3 * 4.7/8 = 24,675字节,这样还不错(公平地说:该示例序列化为30个字节,基数设置为26).

I need to serialize the following data in the smallest file size possible.

I have a collection of patterns, each pattern is a byte array (byte[]) of a set length.

In this example let's use a pattern length of 5, so byte array will be:

var pattern = new byte[] {1, 2, 3, 4, 5};

Let's say we have 3 of the same pattern in a collection:

var collection = new byte[][] { pattern, pattern, pattern };

Currently I am saving the collection in an ASCII encoded file. Using the collection above, the saved file would look like this:

010203040501020304050102030405

Each byte in the array is represented by 2 digits (00) so that I can cater for byte values from 0 to 25, it can be visualized like this:

[01|02|03|04|05] [01|02|03|04|05] [01|02|03|04|05]

When I deserialize the file, I parse each block of 2 characters as a byte and put every 5 bytes into a byte array.

As I understand it, each character in the ASCII-encoded file is one byte - offering a possible 256 different values, but all I need is for each block of 2 characters to be a possible decimal value from 0 to 25.

When I save a file with 50,000 patterns, each with a length of 12, I end up with a 1.7MB file, which is way too big.

What encoding can I use in C# to make my file size much smaller?

Please provide example code of how to write and read this data to/from a file.

解决方案

I did something similar when encoding binary data into barcodes (see Efficient compression and representation of key value pairs to be read from 1D barcodes). Consider the following code which will serialize samples into a file and deserialize them immediately:

static void Main(string[] args)
{
    var data = new List<byte[]>() {
        new byte[] { 01, 05, 15, 04, 11, 00, 01, 01, 05, 15, 04, 11, 00, 01 },
        new byte[] { 09, 04, 02, 00, 08, 12, 01, 07, 04, 02, 00, 08, 12, 01 },
        new byte[] { 01, 05, 06, 04, 02, 00, 01, 01, 05, 06, 04, 02, 00, 01 }
    };

    // has to be known when loading the file
    var reasonableBase = data.SelectMany(i => i).Max() + 1;

    using (var target = File.OpenWrite("data.bin"))
    {
        using (var writer = new BinaryWriter(target))
        {
            // write the number of lines (16 bit, lines limited to 65536)
            writer.Write((ushort)data.Count);

            // write the base (8 bit, base limited to 255)
            writer.Write((byte)reasonableBase);

            foreach (var sample in data)
            {
                // converts the byte array into a large number of the known base (bypasses all the bit-mess)
                var serializedData = ByteArrayToNumberBased(sample, reasonableBase).ToByteArray();

                // write the length of the sample (8 bit, limited to 255)
                writer.Write((byte)serializedData.Length);
                writer.Write(serializedData);
            }
        }
    }

    var deserializedData = new List<byte[]>();

    using (var source = File.OpenRead("data.bin"))
    {
        using (var reader = new BinaryReader(source))
        {
            var lines = reader.ReadUInt16();
            var sourceBase = reader.ReadByte();

            for (int i = 0; i < lines; i++)
            {
                var length = reader.ReadByte();
                var value = new BigInteger(reader.ReadBytes(length));

                // chunk the bytes back of the big number we loaded
                // works because we know the base
                deserializedData.Add(NumberToByteArrayBased(value, sourceBase));
            }
        }
    }
}

private static BigInteger ByteArrayToNumberBased(byte[] data, int numBase)
{
    var result = BigInteger.Zero;

    for (int i = 0; i < data.Length; i++)
    {
        result += data[i] * BigInteger.Pow(numBase, i);
    }

    return result;
}

private static byte[] NumberToByteArrayBased(BigInteger data, int numBase)
{
    var list = new List<Byte>();

    do
    {
        list.Add((byte)(data % numBase));
    }
    while ((data = (data / numBase)) > 0);

    return list.ToArray();
}

Compared to your format, the sample data will serialize to 27 bytes instead of 90. Using @xanatos's 4.7 bit per symbol, the perfect result would be 14 * 3 * 4.7 / 8 = 24,675 bytes, so that's not bad (to be fair: the example serializes to 30 bytes with the base set to 26).

这篇关于C#-如何将字节值保存到可能的最小大小的文件中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆