GZipStream不检测损坏的数据(即使CRC32通过)? [英] GZipStream doesn't detect corrupt data (even CRC32 passes)?

查看:1535
本文介绍了GZipStream不检测损坏的数据(即使CRC32通过)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用GZipStream来压缩/解压缩数据。我选择这个过DeflateStream,因为文档声明GZipStream还添加了一个CRC来检测损坏的数据,这是另一个功能我想要的。我的正单元测试工作良好,我可以压缩一些数据,保存压缩字节数组,然后再次成功解压缩。 .NET GZipStream压缩和解压缩问题帖子帮助我意识到,我需要在访问压缩或解压缩的数据之前关闭GZipStream。



接下来,我继续写一个负单元测试,以确保可以检测到损坏的数据。我以前使用过 GZipStream类从MSDN 压缩文件,使用文本编辑器打开压缩文件,更改一个字节以使其损坏(就像使用文本编辑器打开文件编译器不够糟糕!),保存它,然后解压缩,以确保我得到一个InvalidDataException期望。



当我写单元测试时,我选择一个任意字节损坏(例如compressedDataBytes [50] = 0x99),并得到一个InvalidDataException。到现在为止还挺好。我很好奇,所以我选择了另一个字节,但令我惊讶的是我没有得到一个例外。这可能是好的(例如,我巧合地命中数据块中的未使用的字节),只要数据仍然可以被成功地恢复。但是,我没有得到正确的数据回来了!



为了确保它不是我,我把清除的代码从底部< a href =http://stackoverflow.com/questions/1590846/net-gzipstream-compress-and-decompress-problem> .NET GZipStream压缩和解压缩问题,并修改它顺序损坏的每个字节的压缩数据,直到它无法正确解压缩。这里是改变(注意我是
使用Visual Studio 2010测试框架):

  //成功压缩/解压缩示例代码:
// http://stackoverflow.com/questions/1590846/net-gzipstream-compress-and-decompress-problem
[TestMethod]
public void Test_zipping_with_memorystream_and_corrupting_compressed_data )
{
const string sample =This is a compression test of microsoft .net gzip compression method and decompression methods;
var encoding = new ASCIIEncoding();
var data = encoding.GetBytes(sample);
string sampleOut = null;
byte [] cmpData;

//压缩
使用(var cmpStream = new MemoryStream())
{
using(var hgs = new GZipStream(cmpStream,CompressionMode.Compress))
{
hgs.Write(data,0,data.Length);
}
cmpData = cmpStream.ToArray();
}

int corruptBytesNotDetected = 0;

//以字节为单位破坏数据
(var byteToCorrupt = 0; byteToCorrupt< cmpData.Length; byteToCorrupt ++)
{
//损坏数据
cmpData [byteToCorrupt] ++;

using(var decomStream = new MemoryStream(cmpData))
{
using(var hgs = new GZipStream(decomStream,CompressionMode.Decompress))
{
using(var reader = new StreamReader(hgs))
{
try
{
sampleOut = reader.ReadToEnd();

//如果我们到达这里,GZipStream没有检测到损坏的数据
// ...只要正确的数据被提取,就OK了
corruptBytesNotDetected ++;

var message = string.Format(ByteCorrupted = {0},CorruptBytesNotDetected = {1},
byteToCorrupt,corruptBytesNotDetected);

Assert.IsNotNull(sampleOut,message);
Assert.AreEqual(sample,sampleOut,message);
}
catch(InvalidDataException)
{
//数据已损坏,所以我们期望得到这里
}
}
}
}

//恢复数据
cmpData [byteToCorrupt] - ;
}
}



当我运行此测试时, p>

  Assert.AreEqual失败。预期:<这是microsoft .net gzip压缩方法和解压缩方法的压缩测试>。实际:<> ;. ByteCorrupted = 11,CorruptBytesNotDetected = 8 

因此,这意味着实际上有7种情况,没有区别(字符串被成功恢复),但是破坏字节11既不抛出异常,也不恢复数据。



我错过了什么或做错误?

解决方案

gzip格式中有一个10字节的标头,其中最后七个字节可以被改变而不导致解压缩错误。所以,你注意到没有腐败的七个情况。



在流中其他任何地方都不能检测到错误。大多数时候解压缩器将检测压缩数据格式的错误,甚至不会到达检查crc的点。如果它到达检查crc的点,那么检查应该几乎所有的时间与损坏的输入流失败。 (几乎所有的时间意味着约1 - 2 ^ -32的概率。)



我刚刚试用它(在C与zlib)使用你的示例字符串,它产生一个84字节的gzip流。增加每个84字节留下剩余部分与你一样,导致:两个不正确的标题检查,一个无效的压缩方法,七个成功,一个无效块类型,四个无效距离设置,七个无效的代码长度设置,四个缺失块结束,11无效位长度重复,三个无效位长度重复,两个无效位长度重复,两个意外的结束流,36不正确的数据检查(这是实际的CRC错误)和四个不正确的长度检查在gzip格式中为正确的未压缩数据长度)。在任何情况下都没有检测到损坏的压缩流。



因此,在您的代码或类中,某处必须有一个错误。



更新:



看起来类中有错误。



引人注目的是(或许不是很明显),微软认为他们将无法修复此错误!


I'm using GZipStream to compress / decompress data. I chose this over DeflateStream since the documentation states that GZipStream also adds a CRC to detect corrupt data, which is another feature I wanted. My "positive" unit tests are working well in that I can compress some data, save the compressed byte array and then successfully decompress it again. The .NET GZipStream compress and decompress problem post helped me realize that I needed to close the GZipStream before accessing the compressed or decompressed data.

Next, I continued to write a "negative" unit test to be sure corrupt data could be detected. I had previously used the example for the GZipStream class from MSDN to compress a file, open the compressed file with a text editor, change a byte to corrupt it (as if opening it with a text editor wasn't bad enough!), save it and then decompress it to be sure that I got an InvalidDataException as expected.

When I wrote the unit test, I picked an arbitrary byte to corrupt (e.g., compressedDataBytes[50] = 0x99) and got an InvalidDataException. So far so good. I was curious, so I chose another byte, but to my surprise I did not get an exception. This may be okay (e.g., I coincidentally hit an unused byte in a block of data), so long as the data could still be recovered successfully. However, I didn't get the correct data back either!

To be sure "it wasn't me", I took the cleaned up code from the bottom of .NET GZipStream compress and decompress problem and modified it to sequentially corrupt each byte of the compressed data until it failed to decompress properly. Here's the changes (note that I'm using the Visual Studio 2010 test framework):

// successful compress / decompress example code from:
//    http://stackoverflow.com/questions/1590846/net-gzipstream-compress-and-decompress-problem
[TestMethod]
public void Test_zipping_with_memorystream_and_corrupting_compressed_data()
{
   const string sample = "This is a compression test of microsoft .net gzip compression method and decompression methods";
   var encoding = new ASCIIEncoding();
   var data = encoding.GetBytes(sample);
   string sampleOut = null;
   byte[] cmpData;

   // Compress 
   using (var cmpStream = new MemoryStream())
   {
      using (var hgs = new GZipStream(cmpStream, CompressionMode.Compress))
      {
         hgs.Write(data, 0, data.Length);
      }
      cmpData = cmpStream.ToArray();
   }

   int corruptBytesNotDetected = 0;

   // corrupt data byte by byte
   for (var byteToCorrupt = 0; byteToCorrupt < cmpData.Length; byteToCorrupt++)
   {
      // corrupt the data
      cmpData[byteToCorrupt]++;

      using (var decomStream = new MemoryStream(cmpData))
      {
         using (var hgs = new GZipStream(decomStream, CompressionMode.Decompress))
         {
            using (var reader = new StreamReader(hgs))
            {
               try
               {
                  sampleOut = reader.ReadToEnd();

                  // if we get here, the corrupt data was not detected by GZipStream
                  // ... okay so long as the correct data is extracted
                  corruptBytesNotDetected++;

                  var message = string.Format("ByteCorrupted = {0}, CorruptBytesNotDetected = {1}",
                     byteToCorrupt, corruptBytesNotDetected);

                  Assert.IsNotNull(sampleOut, message);
                  Assert.AreEqual(sample, sampleOut, message);
               }
               catch(InvalidDataException)
               {
                  // data was corrupted, so we expect to get here
               }
            }
         }
      }

      // restore the data
      cmpData[byteToCorrupt]--;
   }
}

When I run this test, I get:

Assert.AreEqual failed. Expected:<This is a compression test of microsoft .net gzip compression method and decompression methods>. Actual:<>. ByteCorrupted = 11, CorruptBytesNotDetected = 8

So, this means there were actually 7 cases where corrupting the data made no difference (the string was successfully recovered), but corrupting byte 11 neither threw an exception, nor recovered the data.

Am I missing something or doing soemthing wrong? Can anyone see why the corrupt compressed data is not being detected?

解决方案

There is a 10-byte header in the gzip format, for which the last seven bytes can be changed without resulting in a decompression error. So the seven cases you noted with no corruption are expected.

It should be vanishingly rare to not detect an error with a corruption anywhere else in the stream. Most of the time the decompressor will detect an error in the format of the compressed data, never even getting to the point of checking the crc. If it does get to the point of checking a crc, that check should fail very nearly all the time with a corrupted input stream. ("Nearly all the time" means a probability of about 1 - 2^-32.)

I just tried it (in C with zlib) using your sample string, which produces an 84-byte gzip stream. Incrementing each of the 84 bytes leaving the remainder the same, as you did, resulted in: two incorrect header checks, one invalid compression method, seven successes, one invalid block type, four invalid distances set, seven invalid code lengths set, four missing end-of-block, 11 invalid bit length repeat, three invalid bit length repeat, two invalid bit length repeat, two unexpected end of stream, 36 incorrect data check (that's the actual CRC error), and four incorrect length check (another check in the gzip format for the correct uncompressed data length). In no cases was a corrupted compressed stream not detected.

So there must be a bug somewhere, either in your code or in the class.

Update:

It appears that there is a bug in the class.

Remarkably (or maybe not remarkably), Microsoft has concluded that they won't fix this bug!

这篇关于GZipStream不检测损坏的数据(即使CRC32通过)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆