DeflateStream 将底层流推进到结束 [英] DeflateStream advancing underlying stream to end

查看:29
本文介绍了DeflateStream 将底层流推进到结束的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试按照 此处.一旦我点击压缩数据,我就会遇到问题.我正在尝试使用 System.IO.Compression.DeflateStream 解压 zlib 压缩对象.我基本上通过跳过前 2 个字节来忽略 zlib 头文件.无论如何,第一个对象的这 2 个字节是 789C.现在麻烦开始了.

I'm trying to read out git objects from a git pack file, following the format for pack files laid out here. Once I hit the compressed data I'm running into issues. I'm trying to use System.IO.Compression.DeflateStream to decompress the zlib compressed objects. I basically ignore the zlib headers by skipping over the first 2 bytes. These 2 bytes for the first object anyway are 789C. Now the trouble starts.

1) 我只知道解压对象的大小.DeflateStream 上的 Read 方法文档声明它将多个 解压缩 字节读入指定的字节数组."这就是我想要的,但是我确实看到有人将此计数设置为压缩数据的大小,我们中的一个人做错了.

1) I only know the size of the decompressed objects. The Read method documentation on the DeflateStream states that it "Reads a number of decompressed bytes into the specified byte array." Which is what I want, however I do see people setting this count to the size of the compressed data, one of us is doing it wrong.

2) 我认为我返回的数据是正确的(人类可读的数据看起来是正确的),但是它正在推进底层流,我一直给它直到最后!例如,我要求它提供 187 个解压缩字节,并一直读取剩余的 212 个字节直到流的末尾.因为在整个流中是 228 字节,而在 deflate 读取 187 字节结束时流的位置现在是 228.我不能向后寻找,因为我不知道压缩数据的结尾在哪里,并且也不是我使用的所有流都是可查找的.这是消耗整个流的预期行为吗?

2) The data I'm getting back is correct, I think (human-readable data that looks right), however it's advancing the underlying stream I give it all the way to the end! For example I ask it for 187 decompressed bytes and reads the remaining 212 bytes all the way to the end of the stream. As in the whole stream is 228 bytes and the position of the stream at the end of the deflate read 187 bytes is now 228. I can't seek backwards, as I don't know where the end of the compressed data is, and also not all the streams I use would be seekable. Is this the expected behavior to consume the whole stream?

推荐答案

根据你引用的页面(我自己也不熟悉这个文件格式),每个数据块都通过索引中的一个偏移字段进行索引,用于文件.由于您知道每个数据块之前的类型和数据长度字段的长度,并且您知道下一个块的偏移量,因此您也知道每个数据块的长度(即压缩的 字节).

According to the page you reference (I'm not familiar with this file format myself), each block of data is indexed by an offset field in the index for the file. Since you know the length of the type and data length fields that precedes each data block, and you know the offset of the next block, you also know the length of each data block (i.e. the length of the compressed bytes).

也就是说,每个数据块的长度只是下一个块的偏移量减去当前块的偏移量,然后减去类型和数据长度字段的长度(但是根据文档是多少字节),它是可变的,但您当然可以在阅读时计算该长度).

That is, the length of each data block is simply the offset of the next block minus the offset of the current block, then minus the length of the type and data length fields (however many bytes that is…according to the documentation, it's variable, but you can certainly compute that length as you read it).

所以:

1) 我只知道解压对象的大小.DeflateStream 上的 Read 方法文档指出它将许多解压缩字节读取到指定的字节数组中".这就是我想要的,但是我确实看到有人将此计数设置为压缩数据的大小,我们中的一个人做错了.

1) I only know the size of the decompressed objects. The Read method documentation on the DeflateStream states that it "Reads a number of decompressed bytes into the specified byte array." Which is what I want, however I do see people setting this count to the size of the compressed data, one of us is doing it wrong.

文档是正确的.DeflateStreamStream 的子类,必须遵循该类的规则.由于StreamRead() 方法输出请求的字节数,所以这些必须是未压缩的字节.

The documentation is correct. DeflateStream is a subclass of Stream, and has to follow that class's rules. Since the Read() method of Stream outputs the number of bytes requested, these must be uncompressed bytes.

请注意,根据上述内容,您确实知道压缩对象的大小.它没有存储在文件中,但您可以从存储在文件中的内容中获取该信息.

Note that per the above, you do know the size of the compressed objects. It's not stored in the file, but you can derive that information from the things that are stored in the file.

2) 我认为我返回的数据是正确的(人类可读的数据看起来是正确的),但是它正在推进底层流,我一直给它直到最后!例如,我要求它提供 187 个解压缩字节,并一直读取剩余的 212 个字节直到流的末尾.因为在整个流中是 228 字节,而在 deflate 读取 187 字节结束时流的位置现在是 228.我不能向后寻找,因为我不知道压缩数据的结尾在哪里,并且也不是我使用的所有流都是可查找的.这是消耗整个流的预期行为吗?

2) The data I'm getting back is correct, I think (human-readable data that looks right), however it's advancing the underlying stream I give it all the way to the end! For example I ask it for 187 decompressed bytes and reads the remaining 212 bytes all the way to the end of the stream. As in the whole stream is 228 bytes and the position of the stream at the end of the deflate read 187 bytes is now 228. I can't seek backwards, as I don't know where the end of the compressed data is, and also not all the streams I use would be seekable. Is this the expected behavior to consume the whole stream?

是的,我希望这会发生.或者至少,我希望会发生一些缓冲,因此即使它没有一直读取到流的末尾,我也希望它至少读取压缩数据末尾之后的一些字节数.

Yes, I would expect that to happen. Or at a minimum, I would expect some buffering to happen, so even if it didn't read all the way to the end of the stream, I would expect it to read at least some number of bytes past the end of the compressed data.

在我看来,您至少有几个选择:

It seems to me that you have at least a couple of options:

  1. 对于每个数据块,计算数据的长度(如上所示),将其读入一个独立的 MemoryStream 对象,然后从该流而不是原始数据中解压数据.
  2. 或者,直接从源流中解压缩,使用索引中提供的偏移量在您读取每个数据块时寻找它.当然,这不适用于不可搜索的流,您指出在您的场景中会发生这种情况.因此,此选项不适用于您的场景中的所有情况.
  1. For each block of data, compute the length of the data (per above), read that into a standalone MemoryStream object, and decompress the data from that stream rather than the original.
  2. Alternatively, go ahead and decompress directly from the source stream, using the offsets provided in the index to seek to each data block as you read it. Of course, this won't work with non-seekable streams, which you indicate occur in your scenario. So this option would not work for all cases in your scenario.

这篇关于DeflateStream 将底层流推进到结束的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆