文件图像处理 [英] document image processing

查看:152
本文介绍了文件图像处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正致力于处理文档图像(主要是发票)的应用程序,基本上,我想将某些感兴趣的区域转换为XML结构,然后根据该数据对文档进行分类。目前我使用ImageJ分析文档图像和Asprise / tesseract for OCR。

I working on an application for processing document images (mainly invoices) and basically, I'd like to convert certain regions of interest into an XML-structure and then classify the document based on that data. Currently I am using ImageJ for analyzing the document image and Asprise/tesseract for OCR.

现在我正在寻找一些可以让开发变得更容易的东西。具体来说,我正在寻找能够自动校正文档图像并分析文档结构的东西(例如,将图像转换为四叉树结构以便于处理)。虽然我更喜欢Java和ImageJ,但我对任何库/代码/论文感兴趣,无论它是用什么编程语言编写的。

Now I am looking for something to make developing easier. Specifically, I am looking for something to automatically deskew a document image and analyze the document structure (e.g. converting an image into a quadtree structure for easier processing). Although I prefer Java and ImageJ I am interested in any libraries/code/papers regardless of the programming language it's written in.

虽然我正在研究的系统应该到目前为止作为可能的过程数据,用户应该监督结果,并在必要时纠正系统建议的分类。因此,我有兴趣使用机器学习技术来获得更可靠的结果。当处理类似文件时,例如特定公司的发票,其结构通常是相同的。当用户先前已经校正了公司的文档数据时,将来应该考虑这些更正。我对机器学习技术知之甚少,想知道如何实现我的想法。

While the system I am working on should as far as possible process data automatically, the user should oversee the results and, if necessary, correct the classification suggested by the system. Therefore I am interested in using machine learning techniques to achieve more reliable results. When similar documents are processed, e.g. invoices of a specific company, its structure is usually the same. When the user has previously corrected data of documents from a company, these corrections should be considered in the future. I have only limited knowledge of machine learning techniques and would like to know how I could realize my idea.

推荐答案

Mathematica中的以下原型找到文本块的坐标并在每个块中执行OCR。您可能需要调整参数值以适合实际图像的尺寸。我没有解决机器学习问题的一部分;也许你甚至不需要它用于这个应用程序。

The following prototype in Mathematica finds the coordinates of blocks of text and performs OCR within each block. You may need to adapt the parameters values to fit the dimensions of your actual images. I do not address the machine learning part of the question; perhaps you would not even need it for this application.

导入图片,为打印部件创建二进制掩模,并使用水平闭合放大这些部分(扩张和侵蚀)。

Import the picture, create a binary mask for the printed parts, and enlarge these parts using an horizontal closing (dilation and erosion).

查询每个blob的方向,聚类方向,并通过平均最大聚类的方向来确定整体旋转。

Query for each blob's orientation, cluster the orientations, and determine the overall rotation by averaging the orientations of the largest cluster.

使用上一个角度来拉直图像。此时OCR是可能的,但是你会丢失文本块的空间信息,这将使后处理比它需要的更加困难。相反,通过水平关闭找到文本blob。

Use the previous angle to straighten the image. At this time OCR is possible, but you would lose the spatial information for the blocks of text, which will make the post-processing much more difficult than it needs to be. Instead, find blobs of text by horizontal closing.

对于每个连接的组件,查询边界框位置和质心位置。使用边界框位置提取相应的图像补丁并在补丁上执行OCR。

For each connected component, query for the bounding box position and the centroid position. Use the bounding box positions to extract the corresponding image patch and perform OCR on the patch.

此时,您有一个字符串列表及其空间位置。这还不是XML,但它听起来像是一个很好的起点,可以直接根据您的需求进行定制。

At this point, you have a list of strings and their spatial positions. That's not XML yet, but it sounds like a good starting point to be tailored straightforwardly to your needs.

这是代码。同样,形态函数的参数(结构元素)可能需要根据实际图像的比例进行更改;此外,如果发票过于倾斜,您可能需要大致旋转结构元素,以便仍能实现良好的不倾斜。

This is the code. Again, the parameters (structuring elements) of the morphological functions may need to change, based on the scale of your actual images; also, if the invoice is too tilted, you may need to "rotate" roughly the structuring elements in order to still achieve good "un-skewing."

img = ColorConvert[Import@"http://www.team-bhp.com/forum/attachments/test-drives-initial-ownership-reports/490952d1296308008-laura-tsi-initial-ownership-experience-img023.jpg", "Grayscale"];
b = ColorNegate@Binarize[img];
mask = Closing[b, BoxMatrix[{2, 20}]]
orientations = ComponentMeasurements[mask, "Orientation"];
angles = FindClusters@orientations[[All, 2]]
\[Theta] = Mean[angles[[1]]]
straight = ColorNegate@Binarize[ImageRotate[img, \[Pi] - \[Theta], Background -> 1]]
TextRecognize[straight]
boxes = Closing[straight, BoxMatrix[{1, 20}]]
comp = MorphologicalComponents[boxes];
measurements = ComponentMeasurements[{comp, straight}, {"BoundingBox", "Centroid"}];
texts = TextRecognize@ImageTrim[straight, #] & /@ measurements[[All, 2, 1]];
Cases[Thread[measurements[[All, 2, 2]] -> texts], (_ -> t_) /; StringLength[t] > 0] // TableForm

这篇关于文件图像处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆