如何使用Python从PDF文件中提取图表/表格/图形? [英] How to extract charts/tables/graphs from PDF files using Python?

查看：1315 发布时间：2020/5/19 19:27:00 python pdf python-3.6 ocr extract

本文介绍了如何使用Python从PDF文件中提取图表/表格/图形?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

搜索了很多，但是由于找不到此类问题的解决方案，因此在同一问题上发布了明确的问题.大多数答案都涵盖了图像/文本提取，相对来说比较容易.

Searched quite a bit but as I couldn't find a solution for this kind of problem, hence posting a clear question on the same. Most answers cover image/text extraction which are comparatively easier.

我需要分别从PDF中提取表格和图形分别为文本(csv)和图像.

I've a requirement of extracting tables and graphs as text (csv) and images respectively from PDFs.

任何人都可以通过高效的python 3.6代码来帮助我解决相同问题吗?

Can anyone help me with an efficient python 3.6 code to solve the same?

到目前为止，我可以使用startmark = b"\ xff \ xd8"和endmark = b"\ xff \ xd9"来提取jpg，但并非PDF中的所有表和图形都是纯jpg，因此我的代码在实现这一目标.

Till now I could achieve extracting jpgs using startmark = b"\xff\xd8" and endmark = b"\xff\xd9", but not all tables and graphs in a PDF are plain jpgs, hence my code fails badly in achieving that.

例如，我想从第11页中提取表格并从第12页中提取图形作为图像或从下面的给定链接中可行的内容.怎么做?

Example, I want to extract table from page 11 and graphs from page 12 as image or something which is feasible from the below given link. How to go about it?

https://hartmannazurecdn.azureedge.net/media/2369 /annual-report-2017.pdf

如何使用Python从PDF文件中提取图表/表格/图形? [英] How to extract charts/tables/graphs from PDF files using Python?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何使用Python从PDF文件中提取图表/表格/图形? [英] How to extract charts/tables/graphs from PDF files using Python?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭