zlib压缩字节数组? [英] zlib compressing byte array?

查看:503
本文介绍了zlib压缩字节数组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个未压缩的字节数组:

  0E 7C BD 03 6E 65 67 6C 65 63 74 00 00 00 00 00 00 00 00 00 42 52 00 00 01 02 01 
00 BB 14 8D 37 0A 00 00 01 00 00 00 00 05 E9 05 E9 00 00 00 00 00 00 00 00 00 00
00 00 00 00 01 00 00 00 00 00 81 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 05 00 00 01 00 00 00

我需要使用deflate算法(在zlib中实现)



这里是压缩代码:

  public byte [] compress(byte [] input)
{
using(MemoryStream ms = new MemoryStream())
{
using(GZipStream deflateStream = new GZipStream(ms,CompressionMode.Compress))
{
deflateStream.Write(input,0,input.Length);
}
return ms.ToArray();
}
}

这是上面压缩代码的结果: / p>



  1F 8B 08 00 00 00 00 00 04 00 ED BD 07 60 1C 49 96 25 26 2F 6D CA 7B 7F 4A F5 4A 
D7 E0 74 A1 08 80 60 13 24 D8 90 40 10 EC C1 88 CD E6 92 EC 1D 69 47 23 29 AB 2A
81 CA 65 56 65 5D 66 16 40 CC ED 9D BC F7 DE 7B EF BD F7 DE 7B EF BD F7 BA 3B 9D
4E 27 F7 DF FF 3F 5C 66 64 01 6C F6 CE 4A DA C9 9E 21 80 AA C8 1F 3F 7E 7C 1F 3F
22 7E 93 9F F9 FB 7F ED 65 7E 51 E6 D3 F6 D7 30 CF 93 57 BF C6 AF F1 6B FE 5A BF
E6 AF F1 F7 FE 56 7F FC 03 F3 D9 AF FB 5F DB AF 83 E7 0F FE 35 23 1F FE BA F4 FE
AF F1 6B FC 1A FF 0F 26 EC 38 82 5C 00 00 00



这里是我期待的结果:

  78 9C E3 AB D9 CB 9C 97 9A 9E 93 9A 5C C2 00 03 4E 41 0C 0C 8C 4C 8C 0C BB 45 7A 
CD B9 80 4C 90 18 EB 4B D6 97 0C 28 00 2C CC D0 C8 C8 80 09 58 21 B2 00 65 6B 08
C8

我做错了,有人能帮我吗?

解决方案

:DEFLATE是压缩算法,它在 RFC 1951 中定义。 DEFLATE用于ZLIB和GZIP格式,在 RFC 1950 1952 ,它们本质上是DEFLATE字节串周围的薄包装。包装器提供元数据,例如,文件的名称,时间戳,CRC或Adlers等等。



.NET的基类库实现了一个DeflateStream,当用于压缩时,它生成一个原始DEFLATE bytestream。当在解压缩中使用时,它消耗一个原始DEFLATE字节流。 .NET还提供了一个GZipStream,它只是一个GZIP包装器。 .NET基类库中没有ZlibStream - 没有生成或使用ZLIB。有一些技巧,做它,你可以搜索。



.NET中的deflate逻辑表现出行为异常,其中先前压缩的数据实际上可能在压缩时显着增加。这是提出的Connect错误的来源Microsoft 已在此讨论SO 。这可能是你看到的,直到无效的压缩。 Microsoft已经拒绝了该错误,因为虽然它不能节省空间,但压缩流不是无效的,换句话说,它可以由任何兼容的DEFLATE引擎解压缩。



在任何情况下,如其他人发布的,不同压缩器生成的压缩字节流可能不一定相同。这取决于它们的默认设置,以及压缩机的应用程序指定的设置。即使压缩的字节流不同,它们仍然可以解压缩到相同的原始字节流。另一方面,你用来压缩的东西是GZIP,而它看起来你是你的ZLIB。虽然他们是相关的,他们是不一样的;您不能使用GZipStream生成ZLIB字节流。这是您看到的差异的主要来源。






我想你想要一个ZLIB流。



DotNetZip项目中的免费托管Zlib实现了压缩所有这三种格式的流(DEFLATE, ZLIB,GZIP)。 DeflateStream和GZipStream的工作方式与.NET内置类相同,并且有一个ZlibStream类,它是你认为它。这些类都没有表现出我上面描述的行为异常。






在代码中,它如下所示:

  byte [] original = new byte [] {
0x0E,0x7C,0xBD,0x03,0x6E,0x65,0x67,0x6C,
0x65,0x63,0x74,0x00,0x00 ,0x00,0x00,0x00,
0x00,0x00,0x00,0x00,0x42,0x52,0x00,0x00,
0x01,0x02,0x01,0x00,0xBB,0x14,0x8D,0x37,
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
0x05,0xE9,0x00,0x00,0x00,0x00,0x00,0x00,0b00,0x00,0x00,0x00,0x00,0x00 ,0x00,0x00,0x00,
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
0x81,0x01,0x00,0x00,0x00,0x00,0x00,0x00,
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0
0x00,0x00,0x00,0x00,0x00,0x05,0x00,0x00,0
0x01,0x00,0x00,0x00
};

var compressed = Ionic.Zlib.ZlibStream.CompressBuffer(original);

输出结果如下:

  0000 78 DA E3 AB D9 CB 9C 97 9A 9E 93 9A 5C C2 00 03 x ........... \ ... 
0010 4E 41 0℃0℃8C 4C 8C 0℃BB 45 7A CD 61 62 AC 2F NA ... L ... Ez.ab./
0020 19 B0 82 46 46 2C 82 AC 40 FD 40 0A 00 35 25 07 .. .FF,.. @。@ .. 5%。
0030 CE。

要解压缩,

  var uncompressed = Ionic.Zlib.ZlibStream.UncompressBuffer(compressed); 






您可以看到关于静态CompressBuffer方法的文档






EDIT



问题出现了,为什么DotNetZip会产生 78 DA 用于前两个字节,而不是 78 9C ?区别是不重要的。 78 DA 编码最大压缩,而 78 9C 编码默认压缩。从数据中可以看出,对于这个小样本,无论使用BEST还是DEFAULT,实际压缩字节都是完全相同的。此外,在解压缩期间不使用压缩级别信息。它对您的应用程序没有影响。



如果你不想要max压缩,换句话说,如果你非常想获得 78 9C 作为前两个字节,即使没有关系,那么您不能使用 CompressBuffer 便利功能,它使用覆盖下的最佳压缩级别。您可以这样做:

  var compress = new Func  using(var ms = new System.IO.MemoryStream())
{
using(var compressor =
new Ionic.Zlib.ZlibStream(ms,
CompressionMode.Compress,
CompressionLevel.Default))
{
compress.Write(a,0,a.Length);
}

.ToArray();
}
});

var original = new byte [] {....};
var compressed = compress(original);

结果是:

  0000 78 9C E3 AB D9 CB 9C 97 9A 9E 93 9A 5C C2 00 03 x ........... \ ... 
0010 4E 41 0C 0C 8C 4C 8C 0C BB 45 7A CD 61 62 AC 2F NA ... L ... Ez.ab./
0020 19 B0 82 46 46 2C 82 AC 40 FD 40 0A 00 35 25 07 ... FF ,.. @。@ .. 5%。
0030 CE。


I have this uncompressed byte array:

0E 7C BD 03 6E 65 67 6C 65 63 74 00 00 00 00 00 00 00 00 00 42 52 00 00 01 02 01
00 BB 14 8D 37 0A 00 00 01 00 00 00 00 05 E9 05 E9 00 00 00 00 00 00 00 00 00 00
00 00 00 00 01 00 00 00 00 00 81 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 05 00 00 01 00 00 00

And I need to compress it using the deflate algorithm (implemented in zlib), from what I searched the equivalent in C# would be using GZipStream but I can't match the compressed resulted at all.

Here is the compressing code:

public byte[] compress(byte[] input)
{
    using (MemoryStream ms = new MemoryStream())
    {
        using (GZipStream deflateStream = new GZipStream(ms, CompressionMode.Compress))
        {
            deflateStream.Write(input, 0, input.Length);
        }
        return ms.ToArray();
    }
}

Here is the result of the above compressing code:

1F 8B 08 00 00 00 00 00 04 00 ED BD 07 60 1C 49 96 25 26 2F 6D CA 7B 7F 4A F5 4A
D7 E0 74 A1 08 80 60 13 24 D8 90 40 10 EC C1 88 CD E6 92 EC 1D 69 47 23 29 AB 2A
81 CA 65 56 65 5D 66 16 40 CC ED 9D BC F7 DE 7B EF BD F7 DE 7B EF BD F7 BA 3B 9D
4E 27 F7 DF FF 3F 5C 66 64 01 6C F6 CE 4A DA C9 9E 21 80 AA C8 1F 3F 7E 7C 1F 3F
22 7E 93 9F F9 FB 7F ED 65 7E 51 E6 D3 F6 D7 30 CF 93 57 BF C6 AF F1 6B FE 5A BF
E6 AF F1 F7 FE 56 7F FC 03 F3 D9 AF FB 5F DB AF 83 E7 0F FE 35 23 1F FE BA F4 FE
AF F1 6B FC 1A FF 0F 26 EC 38 82 5C 00 00 00

Here is the result I am expecting:

78 9C E3 AB D9 CB 9C 97 9A 9E 93 9A 5C C2 00 03 4E 41 0C 0C 8C 4C 8C 0C BB 45 7A
CD B9 80 4C 90 18 EB 4B D6 97 0C 28 00 2C CC D0 C8 C8 80 09 58 21 B2 00 65 6B 08
C8

What I am doing wrong, could some one help me out there ?

解决方案

First, some information: DEFLATE is the compression algorithm, it is defined in RFC 1951. DEFLATE is used in the ZLIB and GZIP formats, defined in RFC 1950 and 1952 respectively, which essentially are thin wrappers around DEFLATE bytestreams. The wrappers provide metadata such as, the name of the file, timestamps, CRCs or Adlers, and so on.

.NET's base class library implements a DeflateStream that produces a raw DEFLATE bytestream, when used for compression. When used in decompression it consumes a raw DEFLATE bytestream. .NET also provides a GZipStream, which is just a GZIP wrapper around that base. There is no ZlibStream in the .NET base class library - nothing that produces or consumes ZLIB. There are some tricks to doing it, you can search around.

The deflate logic in .NET exhibits a behavioral anomaly, where previously compressed data can actually be inflated, significantly, when "compressed". This was the source of a Connect bug raised with Microsoft, and has been discussed here on SO. This may be what you are seeing, as far as ineffective compression. Microsoft have rejected the bug, because while it is ineffective for saving space, the compressed stream is not invalid, in other words it can be "decompressed" by any compliant DEFLATE engine.

In any case, as someone else posted, the compressed bytestream produced by different compressors may not necessarily be the same. It depends on their default settings, and the application-specified settings for the compressor. Even though the compressed bytestreams are different, they may still decompress to the same original bytestream. On the other hand the thing you used to compress was GZIP, while it appears what you want is ZLIB. While they are related, they are not the same; you cannot use GZipStream to produce a ZLIB bytestream. This is the primary source of the difference you see.


I think you want a ZLIB stream.

The free managed Zlib in the DotNetZip project implements compressing streams for all of the three formats (DEFLATE, ZLIB, GZIP). The DeflateStream and GZipStream work the same way as the .NET builtin classes, and there's a ZlibStream class in there, that does what you think it does. None of these classes exhibit the behavior anomaly I described above.


In code it looks like this:

    byte[] original = new byte[] {
        0x0E, 0x7C, 0xBD, 0x03, 0x6E, 0x65, 0x67, 0x6C,
        0x65, 0x63, 0x74, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x00, 0x00, 0x00, 0x00, 0x42, 0x52, 0x00, 0x00,
        0x01, 0x02, 0x01, 0x00, 0xBB, 0x14, 0x8D, 0x37,
        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x05, 0xE9, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x81, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x00, 0x00, 0x00, 0x00, 0x00, 0x05, 0x00, 0x00,
        0x01, 0x00, 0x00, 0x00
    };

    var compressed = Ionic.Zlib.ZlibStream.CompressBuffer(original);

The output is like this:

0000    78 DA E3 AB D9 CB 9C 97 9A 9E 93 9A 5C C2 00 03     x...........\...
0010    4E 41 0C 0C 8C 4C 8C 0C BB 45 7A CD 61 62 AC 2F     NA...L...Ez.ab./
0020    19 B0 82 46 46 2C 82 AC 40 FD 40 0A 00 35 25 07     ...FF,..@.@..5%.
0030    CE                                                  .

To decompress,

    var uncompressed = Ionic.Zlib.ZlibStream.UncompressBuffer(compressed);


You can see the documentation on the static CompressBuffer method.


EDIT

The question is raised, why is DotNetZip producing 78 DA for the first two bytes instead of 78 9C? The difference is immaterial. 78 DA encodes "max compression", while 78 9C encodes "default compression". As you can see in the data, for this small sample, the actual compressed bytes are exactly the same whether using BEST or DEFAULT. Also, the compression level information is not used during decompression. It has no effect in your application.

If you don't want "max" compression, in other words if you are very set on getting 78 9C as the first two bytes, even though it doesn't matter, then you cannot use the CompressBuffer convenience function, which uses the best compression level under the covers. Instead you can do this:

  var compress = new Func<byte[], byte[]>( a => {
        using (var ms = new System.IO.MemoryStream())
        {
            using (var compressor =
                   new Ionic.Zlib.ZlibStream( ms, 
                                              CompressionMode.Compress,
                                              CompressionLevel.Default )) 
            {
                compressor.Write(a,0,a.Length);
            }

            return ms.ToArray();
        }
    });

  var original = new byte[] { .... };
  var compressed = compress(original);

The result is:

0000    78 9C E3 AB D9 CB 9C 97 9A 9E 93 9A 5C C2 00 03     x...........\...
0010    4E 41 0C 0C 8C 4C 8C 0C BB 45 7A CD 61 62 AC 2F     NA...L...Ez.ab./
0020    19 B0 82 46 46 2C 82 AC 40 FD 40 0A 00 35 25 07     ...FF,..@.@..5%.
0030    CE                                                  .

这篇关于zlib压缩字节数组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆