如何从包含多个GzipStreams文件读 [英] How to read from file containing multiple GzipStreams

查看:188
本文介绍了如何从包含多个GzipStreams文件读的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经得到的代码创建了一个文件,该文件是这样的:

I've got a file created with code which looks like this:

        using (var fs=File.OpenWrite("tmp"))
        {
            using (GZipStream gs=new GZipStream(fs,CompressionMode.Compress,true))
            {
                using (StreamWriter sw=new StreamWriter(gs))
                {
                    sw.WriteLine("hello ");
                }
            }

            using (GZipStream gs = new GZipStream(fs, CompressionMode.Compress, true))
            {
                using (StreamWriter sw = new StreamWriter(gs))
                {
                    sw.WriteLine("world");
                }
            }
        }

现在我想用下面的代码读取该文件中的数据:

Now I'm trying to read the data from this file with following code:

        string txt;

        using (var fs=File.OpenRead("tmp"))
        {
            using (GZipStream gs=new GZipStream(fs,CompressionMode.Decompress,true))
            {
                using (var rdr = new StreamReader(gs))
                {
                    txt = rdr.ReadToEnd();
                }
            }

            using (GZipStream gs = new GZipStream(fs, CompressionMode.Decompress, true))
            {
                using (StreamReader sr = new StreamReader(gs))
                {
                    txt+=sr.ReadToEnd();
                }
            }
        }

第一个数据流为正常,但第二流不读。

The first stream reads ok, but the second stream doesn't read.

我如何可以读取第二个流?

How can I read the second stream?

推荐答案

这是同程GzipStream处理与多个条目的gzip的gzip文件有问题。它读取第一个条目,并将所有后续条目垃圾(有趣的是,像gzip实用程序及WinZip他们都提取到一个文件中正确处理它)。有一对夫妇的解决方法,也可以使用第三方工具一样DotNetZip( http://dotnetzip.codeplex.com/ )。

This is a problem with the way GzipStream handles gzip files with multiple gzip entries. It reads the first entry, and treats all succeeding entries as garbage (interestingly, utilities like gzip and winzip handle it correctly by extracting them all into one file).There are a couple of workarounds, or you can use a third-party utility like DotNetZip (http://dotnetzip.codeplex.com/).

也许最简单的是扫描的文件的所有的gzip头的,然后将流手动移动到每一个和解压的内容。这可以通过寻找ID1,ID2来完成,而0x8中在原始文件的字节(放气压缩方法见说明书: http://www.gzip.org/zlib/rfc-gzip.html )。这并不总是足够保证你在寻找一个gzip头,所以你会想读头的其余部分(或者至少第一10个字节)来验证:

Perhaps the easiest is to scan the file for all of the gzip headers, and then manually moving the stream to each one and decompressing the content. This can be done by looking for the ID1, ID2, and 0x8 in the raw file bytes (Deflate compression method, see the specification: http://www.gzip.org/zlib/rfc-gzip.html). This isn't always enough to guarantee that you're looking at a gzip header, so you would want to read the rest of the header (or at least the first ten bytes) in to verify:

    const int Id1 = 0x1F;
    const int Id2 = 0x8B;
    const int DeflateCompression = 0x8;
    const int GzipFooterLength = 8;
    const int MaxGzipFlag = 32; 

    /// <summary>
    /// Returns true if the stream could be a valid gzip header at the current position.
    /// </summary>
    /// <param name="stream">The stream to check.</param>
    /// <returns>Returns true if the stream could be a valid gzip header at the current position.</returns>
    public static bool IsHeaderCandidate(Stream stream)
    {
        // Read the first ten bytes of the stream
        byte[] header = new byte[10];

        int bytesRead = stream.Read(header, 0, header.Length);
        stream.Seek(-bytesRead, SeekOrigin.Current);

        if (bytesRead < header.Length)
        {
            return false;
        }

        // Check the id tokens and compression algorithm
        if (header[0] != Id1 || header[1] != Id2 || header[2] != DeflateCompression)
        {
            return false;
        }

        // Extract the GZIP flags, of which only 5 are allowed (2 pow. 5 = 32)
        if (header[3] > MaxGzipFlag)
        {
            return false;
        }

        // Check the extra compression flags, which is either 2 or 4 with the Deflate algorithm
        if (header[8] != 0x0 && header[8] != 0x2 && header[8] != 0x4)
        {
            return false;
        }

        return true;
    }



注意GzipStream可能如果使用流移到文件的结尾直接文件流。你可能想每个部分读入一个MemoryStream,然后在内存单独解压缩每一个部分。

Note that GzipStream might move the stream to the end of the file if you use the file stream directly. You may want to read each part into a MemoryStream and then decompress each part individually in memory.

另一种方法是修改gzip的头指定的长度内容,这样你就不必扫描头文件(你可以以编程方式确定偏移每个),这将需要潜水一点深入的gzip规范。

An alternate approach would be to modify the gzip headers to specify the length of the content so that you don't have to scan the file for headers (you could programmatically determine the offset of each), which would require diving a bit deeper into the gzip spec.

这篇关于如何从包含多个GzipStreams文件读的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆