如何从PDF提取数据? [英] How to extract data from a PDF?

查看:253
本文介绍了如何从PDF提取数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的公司通过Excel从外部公司接收数据.我们将其导出到SQL Server中以对数据运行报告.他们现在正在更改为PDF格式,是否有办法可靠地从PDF移植数据并将其插入到我们的SQL Server 2008数据库中?

My company receives data from an external company via Excel. We export this into SQL Server to run reports on the data. They are now changing to PDF format, is there a way to reliably port the data from the PDF and insert it into our SQL Server 2008 database?

这是否需要编写应用程序,或者是否有自动化的方法?

Would this require writing an app or is there an automated way of doing this?

推荐答案

这全都取决于他们如何将数据包含在PDF中.一般来说,这里有两种可能的情况:

It all depends on how they've included the data within the PDF. Generally speaking, there's two possible scenarios here:

  1. 数据只是PDF中的文本对象.您需要使用一种工具从PDF中提取文本,然后将其插入数据库中.

  1. The data is just a text object within a PDF. You'll need to use a tool to extract the text from the PDF then insert it into your database.

数据包含在PDF的表单字段中.您需要使用一种工具来从表单字段中提取数据并将其插入数据库中.

The data is contained within form fields in a PDF. You'll need to use a tool to extract data from the form fields and insert it into your database.

希望情况#2适用于您,因为这正是设计PDF表单的目的.方案1实际上只是一种骇客,只有在没有其他选择的情况下,您才可以使用它.从PDF中提取纯文本并不像您期望的那样简单或准确.

Hopefully scenario #2 applies to you because this is precisely what PDF forms are designed for. Scenario #1 is really just a hack that you'd only use if you didn't have any other options. Extracting plain text from a PDF isn't as easy or accurate as you might expect.

如果您收到PDF表单,那么您要做的就是将PDF表单中的正确字段与数据库中的相应字段进行匹配,然后提取数据.如果您编写自己的应用程序,则该过程可以完全自动化.

If you're receiving a PDF form then all you need to do is match up the right fields in the PDF form with the corresponding fields in your database and then suck in the data. This process could be entirely automated if you wrote your own application.

这是否需要编写应用程序或 有自动化的方法吗 这个吗?

Would this require writing an app or is there an automated way of doing this?

是的,这两个选项都需要编写应用程序或购买应用程序.如果您编写自己的应用程序,则需要找到一个第三方PDF库,该库支持从表单字段中检索数据或从PDF中提取文本.

Yes, both of these options would require writing an app or buying an app. If you write your own app then you'll need to find a third-party PDF library that supports retrieving data from form fields or extracting text from a PDF.

这篇关于如何从PDF提取数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆