从PDF python提取/识别表 [英] Extract / Identify Tables from PDF python

查看：115 发布时间：2020/5/25 3:52:10 python pdf scrape pdf-scraping

本文介绍了从PDF python提取/识别表的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

是否有任何支持表识别和分析的开放源代码库?提取?

Are there any open source libraries that support table identification & extraction?

我的意思是:

确定表结构存在
根据内容对表格进行分类
以有用的输出格式从表中提取数据，例如JSON/CSV等

我仔细研究了与此主题相关的类似问题，并发现了以下内容:

I have looked through similar questions on this topic and found the following:

PDFMiner 解决了问题3，但似乎用户需要向PDFMiner指定存在表结构的位置每张桌子(如果我输入错了，请纠正我)
pdf-table-extract 尝试解决问题1，但根据

PDFMiner which addresses problem 3, but it seems the user is required to specify to PDFMiner where a table structure exists for each table (correct me if I'm wrong)
pdf-table-extract which attempts to address problem 1 but according to the To-Do list, cannot currently identify tables that are separated by whitespace. This is a problem as all tables in my PDFs are separated by whitespace!

目前，我认为我将不得不花费大量时间来开发机器学习解决方案以从PDF识别表结构.因此，任何其他替代方法都将受到欢迎！

Currently, I am thinking that I would have to spend a lot of time developing a Machine Learning solution to identify table structures from PDFs. Therefore, any alternative approaches would be more than welcome!

从PDF python提取/识别表 [英] Extract / Identify Tables from PDF python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

从PDF python提取/识别表 [英] Extract / Identify Tables from PDF python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭