PDF文件操作 [英] PDF document manipulation
问题描述
我有几个具有以下属性的PDF:
I have several PDFs with the following properties:
每个PDF都包含可变数量的文档",它们具有不同的页数.
Each PDF contains a variable number of "documents" with differing number of pages.
文档"中的每个页面都有诸如"26页中的第3页"之类的文本.
Each page in a "document" has text such as "Page 3 of 26".
我希望能够自动识别PDF中每个文档"的首页和最后一页(注意:这与PDF的首页和最后一页不同,因为每个PDF可能包含多个文档" ),然后将其提取到新的PDF中,以供以后打印和存档.
I want to be able to automatically identify the first and last page of each "document" within a PDF (Note: this is not the same as the first and last page of a PDF as each PDF may contain several "documents") and extract these into a new PDF for later printing and archival.
我不确定我可以带些什么工具来解决这个问题,以及哪些库可以用来解决这个问题.
I'm not sure what tools I can bring to bear on this problem and what libraries are available to tackle this.
有什么建议吗?最好是免费的,可用于创建将在Windows上运行的工具.
Any recommendations? Preferably free and can be used to create a tool that will run on Windows.
推荐答案
Java有一个不错的免费pdf库.查看 iText .
Java has a nice free pdf library. Check out iText.
在iText网站上:
您可以使用iText进行以下操作:
You can use iText to:
- 将PDF提供给浏览器
- 从XML文件或数据库生成动态文档
- 使用PDF的许多交互式功能
- 添加书签,页码,水印等
- 拆分,连接和处理PDF页面
- 自动填写PDF表单
- 将数字签名添加到PDF文件
- 还有更多...
- Serve PDF to a browser
- Generate dynamic documents from XML files or databases
- Use PDF's many interactive features
- Add bookmarks, page numbers, watermarks, etc.
- Split, concatenate, and manipulate PDF pages
- Automate filling out of PDF forms
- Add digital signatures to a PDF file
- And much more...
由于它是Java,因此在Windows或其他任何地方运行都不会有问题.
Since it's Java, there should be no issues running on Windows, or anywhere else for that matter.
这篇关于PDF文件操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!