尝试解压缩PDF中的流时出错 [英] Error while trying to decompress stream in PDF

查看:837
本文介绍了尝试解压缩PDF中的流时出错的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从此文件中的PDF对象解压缩流 a>:

I'm trying to decompress a stream from a PDF Object in this file:

 4 0 obj
<< 
/Filter /FlateDecode
/Length 64
>>
stream
xœs
QÐw34V02UIS0´0P030PIQÐpÉÏKIUH-.ITH.-*Ê··×TÉRp
á T‰
Ê
endstream
endobj

我将此流复制并粘贴到名为Stream.file

I have this stream copy-pasted with the same format as in the original file in a file called Stream.file

xœs
QÐw34V02UIS0´0P030PIQÐpÉÏKIUH-.ITH.-*Ê··×TÉRp
á T‰
Ê

此流应转换为:Donde esta curro??.将该流添加到C#控制台应用程序中的Stream.file.

This stream should translate to: Donde esta curro??. Added that stream to a Stream.file in a C# Console application.

using System.IO;
using System.IO.Compression;

namespace Filters
{
    public static class FiltersLoader
    {
        public static void Parse()
        {
            var bytes = File.ReadAllBytes("Stream.file");
            var originalFileStream = new MemoryStream(bytes);

            using (var decompressedFileStream = new MemoryStream())
            using (var decompressionStream = new DeflateStream(originalFileStream, CompressionMode.Decompress))
            {
                decompressionStream.CopyTo(decompressedFileStream);
            }    
        }
    }
}

但是在尝试复制它时会产生异常:

However it yields an exception whil trying to copy it:

The archive entry was compressed using an unsupported compression method.

如果可能的话,我想如何使用.net代码对该流进行解码.

I'd like how to decode this stream with .net code if it's possible.

谢谢.

推荐答案

主要问题是DeflateStream类可以解码裸露的FLATE压缩流(根据 RFC 1950 )包装 FLATE压缩数据.

The main problem is that the DeflateStream class can decode a naked FLATE compressed stream (as per RFC 1951) but the content of PDF streams with FlateDecode filter actually is presented in the ZLIB Compressed Data Format (as per RFC 1950) wrapping FLATE compressed data.

要解决此问题,只需删除两个字节的ZLIB标头即可.

To fix this it suffices to drop the two-byte ZLIB header.

在您的第一个示例文档中,另一个问题变得很明显:该文档已加密,因此在FLATE解码之前,必须解密其中的流内容.

Another problem became clear in your first example document: That document was encrypted, so before FLATE decoding the stream contents therein have to be decrypted.

DeflateStream类可以解码裸露的FLATE压缩流(按照 RFC 1951 ),但带有 FlateDecode 过滤器的PDF流的内容实际上以ZLIB压缩数据格式显示(根据 RFC 1950 )包装 FLATE压缩数据.

The DeflateStream class can decode a naked FLATE compressed stream (as per RFC 1951) but the content of PDF streams with FlateDecode filter actually is presented in the ZLIB Compressed Data Format (as per RFC 1950) wrapping FLATE compressed data.

幸运的是,跳转到其中的FLATE编码数据非常容易,只需删除前两个字节即可. (严格来说,它们与FLATE编码数据之间可能有一个字典标识符,但这似乎很少使用.)

Fortunately it is pretty easy to jump to the FLATE encoded data therein, one simply has to drop the first two bytes. (Strictly speaking there might be a dictionary identifier between them and the FLATE encoded data but this appears to be seldom used.)

以您的代码为例:

var bytes = File.ReadAllBytes("Stream.file");
var originalFileStream = new MemoryStream(bytes);

originalFileStream.ReadByte();
originalFileStream.ReadByte();

using (var decompressedFileStream = new MemoryStream())
using (var decompressionStream = new DeflateStream(originalFileStream, CompressionMode.Decompress))
{
    decompressionStream.CopyTo(decompressedFileStream);
}   

如果是加密的PDF,请先解密

您的第一个示例文件 pdf-test.pdf 已加密,如下所示预告片中存在加密条目:

In case of encrypted PDFs, decrypt first

Your first example file pdf-test.pdf is encrypted as is indicated by the presence of an Encrypt entry in the trailer:

trailer
<</Size 37/Encrypt 38 0 R>>
startxref
116
%%EOF

因此,在解压缩流内容之前,必须对其解密.

Before decompressing stream contents, therefore, you have to decrypt them.

这篇关于尝试解压缩PDF中的流时出错的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆