使用PDFBox从PDF正文中提取流转储 [英] Extract Stream-Dump from PDF-Body with PDFBox

查看：241 发布时间：2020/10/27 0:44:31 stream pdfbox dump

本文介绍了使用PDFBox从PDF正文中提取流转储的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想使用PDFBox从PDF提取流转储。
PDFBox可以吗？

我想获取PDF内容的原始十六进制代码，如下所示：

  BT / F19 8.9664 Tf 96.197 606.119 Td [[Kommunikation）] TJ 
 ET 
q 
 1 0 0 1 85.238 594.35 cm 
 [] 0 d 0 J 0.398 w 0 0 m 0 7.352 l S 
 Q 
 BT 
 / F19 8.9664 Tf 133.856 595.758 Td [（Erster）-600（Testuebertrag ）-600（auf）-600（die）-600（Neuentwicklung）-600（fuer）-600（die）-600（PSA）-600（Direktbank）-600（ma）] TJ 
 ET 
q 
 1 0 0 1 85.238 583.989厘米
 [] 0 d 0 J 0.398 w 0 0 m 0 7.352 l S 
 Q 
 BT 
 / F19 8.9664 Tf 133.856 585.397 Td [（l）-600（mit）-600（sehr）-600（langen）-600（Verwendungszweck）-600（gleich）-600（zum）-600（testen）-600（wann）-600（ dieser）-600（cuted）] TJ 
 ET

thx

解决方案

一次性使用，运行 PDFDebugger 并查找内容。

要多次使用，请在首页上使用以下代码：

  try（PDDocument doc = PDDocument.load（new File （ XXX.pdf））； 
 InputStream内容= doc.getPage（0）.getContents（））
 {
 IOUtils.copy（contents，System.out）; 
}

请注意，这只会转储页面内容流。 xobject形式，模式，软蒙版，注释外观流中可能还有其他内容流。 PDF非常复杂。

i want to extract a Stream-Dump from a PDF with PDFBox. Is this possible with PDFBox?

I want to get the original HEX-Code of the Content of a PDF, like this:

BT /F19 8.9664 Tf 96.197 606.119 Td [(Kommunikation)]TJ
ET
q
1 0 0 1 85.238 594.35 cm
[]0 d 0 J 0.398 w 0 0 m 0 7.352 l S
Q
BT
/F19 8.9664 Tf 133.856 595.758 Td [(Erster)-600(Testuebertrag)-600(auf)-600(die)-600(Neuentwicklung)-600(fuer)-600(die)-600(PSA)-600(Direktbank)-600(ma)]TJ
ET
q
1 0 0 1 85.238 583.989 cm
[]0 d 0 J 0.398 w 0 0 m 0 7.352 l S
Q
BT
/F19 8.9664 Tf 133.856 585.397 Td [(l)-600(mit)-600(sehr)-600(langen)-600(Verwendungszweck)-600(gleich)-600(zum)-600(testen)-600(wann)-600(dieser)-600(cuted)]TJ
ET

thx

解决方案

For a single use, run PDFDebugger and look for "Contents".

For multiple use, use this code for the first page:

try (PDDocument doc = PDDocument.load(new File("XXX.pdf")); 
        InputStream contents = doc.getPage(0).getContents())
{
    IOUtils.copy(contents, System.out);
}

Note that this will only dump the page content stream. There may be other content streams in xobject forms, patterns, soft masks, annotation appearance streams. PDF is quite complex.

这篇关于使用PDFBox从PDF正文中提取流转储的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用PDFBox从PDF正文中提取流转储 [英] Extract Stream-Dump from PDF-Body with PDFBox

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用PDFBox从PDF正文中提取流转储 [英] Extract Stream-Dump from PDF-Body with PDFBox

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭