如何将pdf表单字段自动导出到xml [英] How to export pdf form fields to xml automatically

查看:305
本文介绍了如何将pdf表单字段自动导出到xml的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 pdf 文件,其中包括表单字段,需要将数据导出到 xml 文件自动。这是我创建的要测试的示例表单的屏幕:

I have a pdf file including form fields and need to export the data into a xml file AUTOMATICALLY. Here is a screen of a sample form I created for testing:

注意:单击Acrobat Professional,通过单击手动手动将其导出非常有用。工具>表格>导出表单数据,最后选择xml扩展名作为文件输出。这是我手动导出时得到的结果:

Note: It works great exporting it MANUALLY using Acrobat Professional by clicking on Tools > Form > Export Form Data and finally chose xml extension for file output. This is the result I'm getting when I export it manually:

<?xml version="1.0" encoding="UTF-8"?>
<fields>
    <first_name>John</first_name>
    <last_name>Doe</last_name>
</fields>

但是,我需要将其自动化,例如使用 python脚本 Java实现或某些命令行工具。有什么想法可以用来将表单字段数据导出到 xml 的库或工具吗?该工具或库应为开源,以便我可以将其集成到我的工作流程中。

However, I need to automate it, e.g. with a python script, Java implementation or some command line tools. Any ideas which libraries or tools I could use to export form field data to xml? The tool or library should be open source, that I can integrate it in my workflow.

我已经尝试过python pdfminer 库,该库帮助我导出静态零件(例如静态表单标题名字:姓氏:):但是如何导出表单字段数据(以我为例,表单字段的内容 first_name last_name )?

I already tried python pdfminer library, which helped me to export static parts (like Static form header, First name: and Last name:) of the pdf file: But how to export form field data (in my case the content of the form fields first_name and last_name)??

编辑:随时在此处下载sample.pdf文件

推荐答案

Apache PDFBox ?它是开源的,可以满足您的需求,因为该网站显示从PDF表单中提取表单数据或预填充PDF表单。

How about Apache PDFBox? It is open source and could fit your needs, since the website says "Extract forms data from PDF forms or prefill a PDF form."

编辑:查看 PrintFields示例

这篇关于如何将pdf表单字段自动导出到xml的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆