如何使用iText apis提取PDF水印内容 [英] how to extract PDF watermark content using iText apis

查看:139
本文介绍了如何使用iText apis提取PDF水印内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在浏览itext api docs&我能够创建带有水印图像或文本的pdf,但没有找到从pdf获取/提取水印内容的方法。

I was going through the itext api docs & I was able create a pdf with a watermark image or text but did not find a method to get/extract watermark content from pdf.

所以我有一个包含水印的pdf文档文字/图像&我想提取该文本或img并验证我无法做到的事情。

So I have a pdf document containing watermarked text/image & I want to extract that text or img and validate which I am not able to do.

如何使用iText apis提取水印内容?或者有没有其他方法来验证水印内容?

How to extract watermark content using iText apis? Or is there any other way to validate watermark content?

通过验证我的意思是我是否有一个带有水印文本的现有pdf /图像[如上面第二个链接所做的那样] ref],我想检查它是否有预期的文字/图像。

By validate I mean if I have an existing pdf/image with some watermarked text [as done in 2nd link in above ref], I want to check whether it has expected text/image.

参考文献:

  • http://itextpdf.com/themes/keyword.php?id=226
  • http://www.java-connect.com/itext/add-watermark-in-PDF-document-using-java-iText-library.html

推荐答案


如何使用iText apis提取水印内容?或者还有其他方法来验证水印内容吗?

How to extract watermark content using iText apis? Or is there any other way to validate watermark content?



提取水印内容?



与常规页面内容相比,PDF中的水印没有什么特别之处。它们只是

Extracting watermark content?

There is nothing special about watermarks in PDFs in contrast to regular page content. They merely


  • 在内容流中很早就出现了,因此,流中的其他内容会在它上面出现;或者他们

  • appear pretty early in the content stream and other content later in the stream, therefore, is drawn above it; or they

在内容流中出现得相当晚,但是应用了某种透明度。

appear pretty late in the content stream but have some kind of transparency applied.

实际上还有另一种 特殊的水印,即所谓的水印注释。 由于这些注释很容易在文档合并或以其他方式操作时丢失,但它们几乎不会被使用。

Actually there is another type of watermarks which is special, the so-called Watermark Annotations. As these annotation can easily be lost when documents are merged or otherwise manipulated, though, they hardly ever are used.

此外,不同的PDF生成软件提供添加水印的方法的套件以各自的方式进行。因此,您甚至无法通过某些特定操作以某种特定的独特模式识别水印。

Furthermore different PDF generating software suites offering a way to add watermarks do so in their respective individual way. Thus, you cannot even recognize watermarks by some special operations done in some specific unique pattern.

您提到的iText示例已经应用了不同类型的水印

Already the iText examples you referred to apply different kinds of watermarks


  • MovieCountries2 只需使用有角度的基线绘制一些灰色的大文本。

  • StampStationery 将一些完整的页面从一些PDF(它本身可能具有前景和背景材料)复制到目标PDF内的单独对象中,并添加对此的引用目标的每个页面开头的对象。

  • InsertPages 类似地在每个新生成的目标文档页面上引用某些PDF中的页面。

  • MovieCountries2 simply draws some gray large Text using an angled base line.
  • StampStationery copies a complete page from some PDF (which itself may visually have foreground and background material) into a separate object inside the target PDF and adds a reference to this object at the beginning of every page of the target.
  • InsertPages similarly references a page from some PDF on every newly generated target document page.

因此,盲目水印提取几乎是不可能的。

Thus, blind watermark extraction is virtually impossible.

但是,如果您知道要搜索的内容,可以尝试一些验证。您只是不仅搜索一些(在PDF中不存在)固定水印流,而是搜索整个页面内容。

You might try some validation, though, if you know what you are searching for. You simply do not merely search some (in PDF not existing) fixed watermark stream but instead the whole page content.

iText提供的类解析器包,允许从内容流中提取文本和/或位图图像。查看关键字 PARSING PDF>提取图像 PARSING PDF>提取文字

iText offers the classes of the parser package which allow extraction of text and/or bitmap images from content streams. Look at the samples referenced from the keywords PARSING PDF > EXTRACTING IMAGES and PARSING PDF > EXTRACTING TEXT.

您只需要检查这些类是否可以找到您期望的图像或文本,并按照您的预期定位和设置样式。

You merely have to check whether the image or text which you expect can be found by these classes positioned and styled as you expect.

这篇关于如何使用iText apis提取PDF水印内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆