如何在 PDF 文件中编写文本搜索和替换程序 [英] How to program a text search and replace in PDF files

查看:29
本文介绍了如何在 PDF 文件中编写文本搜索和替换程序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我如何能够以编程方式搜索和替换大量 PDF 文件中的某些文本?我想删除已添加到一组文件中的 URL.我已经能够在 Adob​​e Pro 的批处理下使用 javascript 删除链接,但链接文本仍然存在.我看到了使用文本修饰的建议,它可以手动工作,但我不想手动修改 1300 个文件.

How would I be able to programmatically search and replace some text in a large number of PDF files? I would like to remove a URL that has been added to a set of files. I have been able to remove the link using javascript under Batch Processing in Adobe Pro, but the link text remains. I have seen recommendations to use text touchup, which works manually, but I don't want to modify 1300 files manually.

推荐答案

由于文档格式的图形特性,在 PDF 中查找文本本身就很困难——您要搜索的字母在文件中可能不连续.也就是说,CAM::PDF 具有一些搜索替换功能和启发式方法.试试 changepagestring.pl 看看它是否适用您的 PDF.

Finding text in a PDF can be inherently hard because of the graphical nature of the document format -- the letters you are searching for may not be contiguous in the file. That said, CAM::PDF has some search-replace capabilities and heuristics. Give changepagestring.pl a try and see if it works on your PDFs.

安装:

 $ cpan install CAM::PDF
 # start a new terminal if this is your first cpan module
 $ changepagestring.pl input.pdf oldtext newtext output.pdf

这篇关于如何在 PDF 文件中编写文本搜索和替换程序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆