使用Python搜索和替换PDF中的占位符文本 [英] Search and replace placeholder text in PDF with Python

查看:550
本文介绍了使用Python搜索和替换PDF中的占位符文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要生成模板文档的自定义PDF副本. 我认为最简单的方法是创建一个源PDF,其中包含一些需要进行自定义的占位符文本,即<first_name><last_name>,然后将它们替换为正确的值.

I need to generate a customized PDF copy of a template document. The easiest way - I thought - was to create a source PDF that has some placeholder text where customization needs to happen , ie <first_name> and <last_name>, and then replace these with the correct values.

我搜索过很多,但实际上没有办法基本上获取源模板PDF,用实际值替换占位符并写入新的PDF吗?

I've searched high and low, but is there really no way of basically taking the source template PDF, replace the placeholders with actual values and write to a new PDF?

我查看了PyPDF2和ReportLab,但似乎都没有. 有什么建议?我的大多数搜索都导致使用Perl应用程序CAM :: PDF,但我希望将其全部保留在Python中.

I looked at PyPDF2 and ReportLab but neither seem to be able to do so. Any suggestions? Most of my searches lead to using a Perl app, CAM::PDF, but I'd prefer to keep it all in Python.

推荐答案

没有直接的方法可以可靠地工作. PDF与HTML不同:PDF逐个字符地指定文本的位置.它们甚至可能不包括用于呈现文本的整个字体,而仅包括呈现文档中特定文本所需的字符.我发现没有一个库会在更新文本后做很多不错的事情,例如重新包装段落. PDF大多数情况下仅是一种显示格式,因此,使用一种将标记转换为PDF的工具要比就地更新PDF更好.

There is no direct way to do this that will work reliably. PDFs are not like HTML: they specify the positioning of text character-by-character. They may not even include the whole font used to render the text, just the characters needed to render the specific text in the document. No library I've found will do nice things like re-wrap paragraphs after updating the text. PDFs are for the most part a display-only format, so you'll be much better off using a tool that turns markup into a PDF than updating the PDF in-place.

如果这不是一种选择,则可以使用某种方式创建 PDF表单 (例如Acrobat),然后使用PDF操作库,例如 iText(AGPL)可以处理其中一部分的pdfboxing .

If that's not an option, you can create a PDF form in something like Acrobat, then use a PDF manipulation library like iText (AGPL) or pdfbox, which has a nice clojure wrapper called pdfboxing that can handle some of that.

根据我的经验,Python对写入PDF的支持非常有限.到目前为止,Java拥有最好的语言支持.此外,您可以得到所要支付的价格,因此,如果您将iText许可证用于商业目的,则可能值得支付.使用PDF操纵CLI工具(如pdfboxing和ghostscript)编写python包装器时,我取得了相当不错的效果.对于您的用例而言,这可能容易得多,而不是试图将其引入Python的PDF生态系统中.

From my experience, Python's support for writing to PDFs is pretty limited. Java has, by far, the best language support. Also, you get what you pay for, so it would probably be worth paying for a iText license if you're using this for commercial purposes. I've had pretty good results writing python wrappers around PDF-manipulation CLI tools like pdfboxing and ghostscript. That will probably be much easier for your use case than trying to shoehorn this into Python's PDF ecosystem.

这篇关于使用Python搜索和替换PDF中的占位符文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆