尝试在 PDF 中解压缩流时出错 [英] Error while trying to decompress stream in PDF
问题描述
我正在尝试从这个 文件中的 PDF 对象解压流一个>:
I'm trying to decompress a stream from a PDF Object in this file:
4 0 obj
<<
/Filter /FlateDecode
/Length 64
>>
stream
xœs
QÐw34V02UIS0´0P030PIQÐpÉÏKIUH-.ITH.-*Ê··×TÉRp
á T‰
Ê
endstream
endobj
我将此流复制粘贴到名为 Stream.file
的文件中,其格式与原始文件中的格式相同
I have this stream copy-pasted with the same format as in the original file in a file called Stream.file
xœs
QÐw34V02UIS0´0P030PIQÐpÉÏKIUH-.ITH.-*Ê··×TÉRp
á T‰
Ê
此流应转换为:Donde esta curro??
.将该流添加到 C# 控制台应用程序中的 Stream.file
.
This stream should translate to: Donde esta curro??
. Added that stream to a Stream.file
in a C# Console application.
using System.IO;
using System.IO.Compression;
namespace Filters
{
public static class FiltersLoader
{
public static void Parse()
{
var bytes = File.ReadAllBytes("Stream.file");
var originalFileStream = new MemoryStream(bytes);
using (var decompressedFileStream = new MemoryStream())
using (var decompressionStream = new DeflateStream(originalFileStream, CompressionMode.Decompress))
{
decompressionStream.CopyTo(decompressedFileStream);
}
}
}
}
然而,它在尝试复制时产生异常:
However it yields an exception whil trying to copy it:
存档条目是使用不受支持的压缩方法压缩的.
如果可能,我想知道如何使用 .net 代码解码此流.
I'd like how to decode this stream with .net code if it's possible.
谢谢.
推荐答案
主要问题是 DeflateStream
类可以解码裸露的 FLATE 压缩流(根据 RFC 1951) 但带有 FlateDecode 过滤器的 PDF 流的内容实际上是在 ZLIB 压缩中呈现的数据格式(根据 RFC 1950)wrapping FLATE 压缩数据.
The main problem is that the DeflateStream
class can decode a naked FLATE compressed stream (as per RFC 1951) but the content of PDF streams with FlateDecode filter actually is presented in the ZLIB Compressed Data Format (as per RFC 1950) wrapping FLATE compressed data.
要解决此问题,只需删除两字节的 ZLIB 标头即可.
To fix this it suffices to drop the two-byte ZLIB header.
另一个问题在您的第一个示例文档中变得清晰:该文档已加密,因此在对其中的流内容进行 FLATE 解码之前,必须对其进行解密.
Another problem became clear in your first example document: That document was encrypted, so before FLATE decoding the stream contents therein have to be decrypted.
###Drop ZLIB 标头以获取 FLATE 编码数据
###Drop ZLIB header to get to the FLATE encoded data
DeflateStream
类可以解码裸 FLATE 压缩流(根据 RFC 1951) 但带有 FlateDecode 过滤器的 PDF 流的内容实际上以 ZLIB 压缩数据格式显示(根据 RFC 1950) 包装 FLATE 压缩数据.
The DeflateStream
class can decode a naked FLATE compressed stream (as per RFC 1951) but the content of PDF streams with FlateDecode filter actually is presented in the ZLIB Compressed Data Format (as per RFC 1950) wrapping FLATE compressed data.
幸运的是,很容易跳转到其中的 FLATE 编码数据,只需删除前两个字节即可.(严格来说,它们和 FLATE 编码的数据之间可能有一个字典标识符,但这似乎很少使用.)
Fortunately it is pretty easy to jump to the FLATE encoded data therein, one simply has to drop the first two bytes. (Strictly speaking there might be a dictionary identifier between them and the FLATE encoded data but this appears to be seldom used.)
如果是您的代码:
var bytes = File.ReadAllBytes("Stream.file");
var originalFileStream = new MemoryStream(bytes);
originalFileStream.ReadByte();
originalFileStream.ReadByte();
using (var decompressedFileStream = new MemoryStream())
using (var decompressionStream = new DeflateStream(originalFileStream, CompressionMode.Decompress))
{
decompressionStream.CopyTo(decompressedFileStream);
}
###如果是加密的PDF,先解密
###In case of encrypted PDFs, decrypt first
您的第一个示例文件 pdf-test.pdf 已加密,如下所示预告片中存在加密条目:
Your first example file pdf-test.pdf is encrypted as is indicated by the presence of an Encrypt entry in the trailer:
trailer
<</Size 37/Encrypt 38 0 R>>
startxref
116
%%EOF
因此,在解压缩流内容之前,您必须对其进行解密.
Before decompressing stream contents, therefore, you have to decrypt them.
这篇关于尝试在 PDF 中解压缩流时出错的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!