使用Java和Itext编辑PDF文本 [英] Editing PDF text using Java and Itext

查看:565
本文介绍了使用Java和Itext编辑PDF文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有办法可以编辑PDF文档?喜欢查找和替换特定文本?

Is there a way I can edit a PDF document text? like find and replace specific text ?

我有一个PDF文档,其中包含我需要识别和替换的文本的占位符,或者只删除该文本。

I have a PDF document which contains placeholders for text that I need to identify and be replaced or just delete that text.

我可以使用特定坐标(x,y)编辑pdf,但无法识别和替换。我看到的所有库都是从头开始创建PDF和小编辑功能。
无论如何,我可以使用itext编辑上面的内容吗?
请指教...谢谢!

I am able to edit the pdf with a specific coordinates (x, y) but unable to identify and replace. All the libraries that I saw created PDF from scratch and small editing functionality. Is there anyway I can edit above explained using itext? please advise...thank you!

最古老的古典希腊语和拉丁语写作很少或者在单词或其他单词之间没有空格,并且可以用boustrophedon(交替方向)书写。随着时间的推移,文本方向(从左到右)变得标准化,并且分词和终端标点符号变得普遍。
**日期:

FROM:
将句子分成小组的第一种方法是原始段落,类似于下划线新组的开头
------------------------------------ ----------------------- **

The oldest classical Greek and Latin writing had little or no spaces between words or other ones, and could be written in boustrophedon (alternating directions). Over time, text direction (left to right) became standardized, and word dividers and terminal punctuation became common. **DATE: FROM: The first way to divide sentences into groups was the original paragraphos, similar to an underscore at the beginning of the new group -----------------------------------------------------------**

推荐答案

请允许我复制我的第6章简介.com / bookrel =noreferrer> book :

Allow me to copy the intro of chapter 6 of my book:


当我写第一本关于iText的书时,出版商不喜欢
副标题创建和操纵 PDF。他不喜欢单词
操纵因为它的一些贬义。如果您查阅 Yahoo上的字典!教育,你会发现
以下定义:

When I wrote the first book about iText, the publisher didn’t like the subtitle "Creating and Manipulating PDF." He didn’t like the word manipulating because of some of its pejorative meanings. If you consult the dictionary on Yahoo! education, you’ll find the following definitions:


  • 影响或管理精明或狡猾

  • 篡改或伪造个人利益

显然,这不是本书的内容。发布商建议
创建和编辑 PDF作为更好的字幕。我解释说
PDF不是一种非常适合编辑的文档格式。 PDF是一个结束
的产品。这是显示格式。这是 字处理
格式。

Obviously, that’s not what the book is about. The publisher suggested "Creating and Editing PDF" as a better subtitle. I explained that PDF isn’t a document format well suited for editing. PDF is an end product. It’s a display format. It’s not a word processing format.

在文字处理格式中,内容是分发的在应用程序中打开文档而不是更早时,在不同的
页面上。这个
有一些缺点:如果你在不同的
应用程序中打开同一个文档,你最终会得到不同的页面数。当在Microsoft Word中查看时,相同的
文本片段可以在页面 X 上;在Open Office中查看时,页面 Y 上的
。这正是您希望通过选择PDF来避免的
问题。

In a word processing format, the content is distributed over different pages when you open the document in an application, not earlier. This has some disadvantages: if you open the same document in different applications, you can end up with a different page count. The same text snippet can be on page X when looked at in Microsoft Word, and on page Y when viewed in Open Office. That’s exactly the kind of problem you want to avoid by choosing PDF.

在PDF文档中,PDF页面上的每个字符或字形都有其
固定位置,无论用于查看
文档的应用程序如何。这是一个优点,但它也有一个缺点。
假设你想在一个句子中用操纵
替换单词edit,你必须重排文本。您必须重新定位
跟随该单词的所有字符。也许你甚至需要
将部分文本移到下一页。这不是微不足道的,如果
不是不可能的。

In a PDF document, every character or glyph on a PDF page has its fixed position, regardless of the application that’s used to view the document. This is an advantage, but it also comes with a disadvantage. Suppose you want to replace the word "edit" with the word "manipulate" in a sentence, you’d have to reflow the text. You’d have to reposition all the characters that follow that word. Maybe you’d even have to move a portion of the text to the next page. That’s not trivial, if not impossible.

如果你想编辑PDF,建议你改变原来的
来源文档并重新制作PDF。如果原始文档
是使用Microsoft Word编写的,请更改Word文档,并从新版本的Word文档中生成
PDF。不要指望任何
工具能够像编辑Word
文档一样编辑PDF文件。

If you want to "edit" a PDF, it’s advised that you change the original source of the document and remake the PDF. If the original document was written using Microsoft Word, change the Word document, and make the PDF from the new version of the Word document. Don’t expect any tool to be able to edit a PDF file the same way you’d edit a Word document.

这是说,操纵动词也意味着

This being said, the verb "to manipulate" also means


  • 用手或机械方式移动,安排,操作或控制,特别是熟练的方式

这正是你在本章中要做的。使用iText,
你将以熟练的
方式操纵PDF文件的页面。您将把PDF文档看作是用
数字纸制作的。

That’s exactly what you’re going to do in this chapter. Using iText, you’re going to manipulate the pages of a PDF file in a skillful manner. You’re going to treat a PDF document as if it were made of digital paper.

在你的问题中,你说:我看到的所有库都是从头开始创建PDF和小编辑功能。

In your question, you say: "All the libraries that I saw created PDF from scratch and small editing functionality."

嗯,这是正常的。它是您选择的文档格式所固有的。您的设计涉及您需要识别和替换或仅删除的文本的占位符存在严重缺陷。它的文档格式选择错误。您应该选择适合编辑的格式。 PDF不是这样的格式。

Well, that's only normal. It's inherent to the document format you've chosen. Your design that involves "placeholders for text that you need to identify and replace or just delete" is seriously flawed. It suffers from a wrong choice of document format. You should have chosen a format that is suited for editing. PDF isn't such a format.

这篇关于使用Java和Itext编辑PDF文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆