PDF文本字符串的编码 [英] Encoding of PDF text string

查看：156 发布时间：2020/5/25 4:03:01 pdf

本文介绍了PDF文本字符串的编码的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在研究用于PDF(文本提取)的解析器.

I am working on parser for PDF (text extraction).

需要对页面进行 Flate解码(通过zlib压缩)时，我的代码可以解压缩内容流，然后输出如下内容(流对象):

When page need to be Flate Decoded (from zlib compression), my code is able to decompress content streams, and then I have output (stream object) something like below:

BT
56.8 721.3 Td 
/F2 12 Tf
[<01>2<0203>2<04>-10<0503>2<04>-2<0506070809>2<0A>1<0B>]TJ
ET

我对字符串数组(TJ的操作数)感兴趣.

I am interested in the string array (operand of TJ).

似乎此数组中包含多个十六进制编码的字符串，但是相应的十六进制值没有意义.相反，它看起来像是010203 ... lz77压缩之类的序列.

It seems like there are multiple hex encoded strings contained in this array but corresponding hex values do not make sense. Instead it appears a sequence like 010203... sort of lz77 compression.

PDF是否具有多个压缩级别?
如何从字符串数组上方获取纯文本?

PDF文本字符串的编码 [英] Encoding of PDF text string

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

PDF文本字符串的编码 [英] Encoding of PDF text string

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭