从PDF中提取带坐标和大小的图像和单词 [英] Extract Images and Words with coordinates and sizes from PDF

查看：272 发布时间：2018/7/24 16:54:42 image pdf coordinates extraction words

本文介绍了从PDF中提取带坐标和大小的图像和单词的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我已经阅读了很多关于PDF提取和库（如iText）的内容，但我还没有找到从PDF中提取图像和文本（带坐标）的解决方案。

I've read much about PDF extractions and libraries (as iText) but i just haven't found a solution to extract images and text (with coordinates) from a PDF.

任务是使用产品目录扫描PDF并提取每个图像。每个图像旁边都印有图像代码，还有图像上显示的产品的产品代码列表。

The task is to scan PDF with catalog of products and extract each image. There is an image code printed next to each image and also a list of product codes for products that are shown on the image.

我知道没有办法从这样的PDF中提取结构化信息，但是使用所有图像和文本对象的坐标，我可以编写代码来识别链接文本与图像的距离。然后我可以使用RegExp拆分文本，找出什么是产品代码，什么是图像代码等。

I know that there is no way to extract structured info from a PDF like this but with coordinates of all image and text objects I could write code to identify linked text by its distance from the image. Then I could split text using a RegExp and find out what is a product code, what is an image code etc.

你能为这项任务推荐一个好的工作解决方案吗？

Could you recommend a good and working solution for the task?

从PDF中提取带坐标和大小的图像和单词 [英] Extract Images and Words with coordinates and sizes from PDF

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

从PDF中提取带坐标和大小的图像和单词 [英] Extract Images and Words with coordinates and sizes from PDF

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭