尝试在 PDF 中解压缩流时出错 [英] Error while trying to decompress stream in PDF

查看:46
本文介绍了尝试在 PDF 中解压缩流时出错的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从这个 文件中的 PDF 对象解压流一个>:

I'm trying to decompress a stream from a PDF Object in this file:

 4 0 obj
<< 
/Filter /FlateDecode
/Length 64
>>
stream
xœs
QÐw34V02UIS0´0P030PIQÐpÉÏKIUH-.ITH.-*Ê··×TÉRp
á T‰
Ê
endstream
endobj

我将此流复制粘贴到名为 Stream.file 的文件中,其格式与原始文件中的格式相同

I have this stream copy-pasted with the same format as in the original file in a file called Stream.file

xœs
QÐw34V02UIS0´0P030PIQÐpÉÏKIUH-.ITH.-*Ê··×TÉRp
á T‰
Ê

此流应转换为:Donde esta curro??.将该流添加到 C# 控制台应用程序中的 Stream.file.

This stream should translate to: Donde esta curro??. Added that stream to a Stream.file in a C# Console application.

using System.IO;
using System.IO.Compression;

namespace Filters
{
    public static class FiltersLoader
    {
        public static void Parse()
        {
            var bytes = File.ReadAllBytes("Stream.file");
            var originalFileStream = new MemoryStream(bytes);

            using (var decompressedFileStream = new MemoryStream())
            using (var decompressionStream = new DeflateStream(originalFileStream, CompressionMode.Decompress))
            {
                decompressionStream.CopyTo(decompressedFileStream);
            }    
        }
    }
}

然而,它在尝试复制时产生异常:

However it yields an exception whil trying to copy it:

存档条目是使用不受支持的压缩方法压缩的.

如果可能,我想知道如何使用 .net 代码解码此流.

I'd like how to decode this stream with .net code if it's possible.

谢谢.

推荐答案

主要问题是 DeflateStream 类可以解码裸露的 FLATE 压缩流(根据 RFC 1951) 但带有 FlateDecode 过滤器的 PDF 流的内容实际上是在 ZLIB 压缩中呈现的数据格式(根据 RFC 1950)wrapping FLATE 压缩数据.

The main problem is that the DeflateStream class can decode a naked FLATE compressed stream (as per RFC 1951) but the content of PDF streams with FlateDecode filter actually is presented in the ZLIB Compressed Data Format (as per RFC 1950) wrapping FLATE compressed data.

要解决此问题,只需删除两字节的 ZLIB 标头即可.

To fix this it suffices to drop the two-byte ZLIB header.

另一个问题在您的第一个示例文档中变得清晰:该文档已加密,因此在对其中的流内容进行 FLATE 解码之前,必须对其进行解密.

Another problem became clear in your first example document: That document was encrypted, so before FLATE decoding the stream contents therein have to be decrypted.

###Drop ZLIB 标头以获取 FLATE 编码数据

###Drop ZLIB header to get to the FLATE encoded data

DeflateStream 类可以解码裸 FLATE 压缩流(根据 RFC 1951) 但带有 FlateDecode 过滤器的 PDF 流的内容实际上以 ZLIB 压缩数据格式显示(根据 RFC 1950) 包装 FLATE 压缩数据.

The DeflateStream class can decode a naked FLATE compressed stream (as per RFC 1951) but the content of PDF streams with FlateDecode filter actually is presented in the ZLIB Compressed Data Format (as per RFC 1950) wrapping FLATE compressed data.

幸运的是,很容易跳转到其中的 FLATE 编码数据,只需删除前两个字节即可.(严格来说,它们和 FLATE 编码的数据之间可能有一个字典标识符,但这似乎很少使用.)

Fortunately it is pretty easy to jump to the FLATE encoded data therein, one simply has to drop the first two bytes. (Strictly speaking there might be a dictionary identifier between them and the FLATE encoded data but this appears to be seldom used.)

如果是您的代码:

var bytes = File.ReadAllBytes("Stream.file");
var originalFileStream = new MemoryStream(bytes);

originalFileStream.ReadByte();
originalFileStream.ReadByte();

using (var decompressedFileStream = new MemoryStream())
using (var decompressionStream = new DeflateStream(originalFileStream, CompressionMode.Decompress))
{
    decompressionStream.CopyTo(decompressedFileStream);
}   

###如果是加密的PDF,先解密

###In case of encrypted PDFs, decrypt first

您的第一个示例文件 pdf-test.pdf 已加密,如下所示预告片中存在加密条目:

Your first example file pdf-test.pdf is encrypted as is indicated by the presence of an Encrypt entry in the trailer:

trailer
<</Size 37/Encrypt 38 0 R>>
startxref
116
%%EOF

因此,在解压缩流内容之前,您必须对其进行解密.

Before decompressing stream contents, therefore, you have to decrypt them.

这篇关于尝试在 PDF 中解压缩流时出错的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆