如何将pdf表单字段自动导出到xml [英] How to export pdf form fields to xml automatically
问题描述
我有一个 pdf
文件,其中包括表单字段,需要将数据导出到 xml
文件自动。这是我创建的要测试的示例表单的屏幕:
I have a pdf
file including form fields and need to export the data into a xml
file AUTOMATICALLY. Here is a screen of a sample form I created for testing:
注意:单击Acrobat Professional,通过单击手动手动将其导出非常有用。工具>表格>导出表单数据
,最后选择xml扩展名作为文件输出。这是我手动导出时得到的结果:
Note: It works great exporting it MANUALLY using Acrobat Professional by clicking on Tools > Form > Export Form Data
and finally chose xml extension for file output. This is the result I'm getting when I export it manually:
<?xml version="1.0" encoding="UTF-8"?>
<fields>
<first_name>John</first_name>
<last_name>Doe</last_name>
</fields>
但是,我需要将其自动化,例如使用 python脚本, Java实现或某些命令行工具。有什么想法可以用来将表单字段数据导出到 xml
的库或工具吗?该工具或库应为开源,以便我可以将其集成到我的工作流程中。
However, I need to automate it, e.g. with a python script, Java implementation or some command line tools. Any ideas which libraries or tools I could use to export form field data to xml
? The tool or library should be open source, that I can integrate it in my workflow.
我已经尝试过python pdfminer
库,该库帮助我导出静态零件(例如静态表单标题
,名字:
和姓氏:
):但是如何导出表单字段数据(以我为例,表单字段的内容 first_name
和 last_name
)?
I already tried python pdfminer
library, which helped me to export static parts (like Static form header
, First name:
and Last name:
) of the pdf file: But how to export form field data (in my case the content of the form fields first_name
and last_name
)??
编辑:随时在此处下载sample.pdf文件。
推荐答案
Apache PDFBox ?它是开源的,可以满足您的需求,因为该网站显示从PDF表单中提取表单数据或预填充PDF表单。
How about Apache PDFBox? It is open source and could fit your needs, since the website says "Extract forms data from PDF forms or prefill a PDF form."
编辑:查看 PrintFields示例。
这篇关于如何将pdf表单字段自动导出到xml的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!