将PDF转换为Excel [英] Convert PDF to Excel

查看:126
本文介绍了将PDF转换为Excel的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何将pdf内的表转换为excel.

How to convert the table which is inside the pdf to excel .

我尝试了一些在线工具,但结果却达到了60%.

I have tried some online tools but it was giving 60% result.

下面是我的pdf中包含的示例表. 我已经隐藏了包含名称字段的字段.

The sample table which contains in my pdf is given below. I have hidden the field which contains name filed.

推荐答案

从pdf文件中获取数据非常混乱.如果pdf表是有序的,并且嵌入了一个唯一的模式,则获取数据的最佳方法是将pdf转换为xml.为此,您可以使用: pdftohtml .

Getting data out from a pdf file is pretty messy. If the pdf table is ordered and has got a unique pattern embedded along with it, the best way to get the data is by converting the pdf to xml. For this you can use: pdftohtml.

安装:sudo apt-get install pdftohtml

用法:pdftohtml -xml *Your File.pdf* *Output File.xml*

您可以直接在终端中运行此命令.

You can run this command directly in the terminal.

您现在将获得的xml文件将具有与html一样的标签,您可以使用它们从生成的xml输出中获取数据.

The xml file which you will get now will have tags just like html which you can use to get the data from the generated xml output.

PS:要注意的一件事是,如果不对pdf表进行排序,那么从xml中获取数据将变得非常困难,因为标记将具有某些与模式不匹配的属性.在这种情况下,您将需要对代码进行硬编码.

PS: One thing to be noted if the pdf table is not ordered then it becomes very difficult to get the data out from that xml because the tags will have some attributes which will not match the pattern. In that case you will need to hard code things.

这篇关于将PDF转换为Excel的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆