对齐OCR的文字 [英] Align text for OCR

查看：118 发布时间：2020/5/19 19:27:44 python image-processing ocr

本文介绍了对齐OCR的文字的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在根据历史记录创建数据库，这些历史记录作为我从书籍中拍摄的页面(超过100K页).我在OCR每页之前写了一些python代码来做一些图像处理.由于这些书中的数据不是以格式正确的表格提供的，因此我需要将每一页分为行和列，然后分别对每部分进行OCR.

I am creating a database from historical records which I have as photographed pages from books (+100K pages). I wrote some python code to do some image processing before I OCR each page. Since the data in these books does not come in well formatted tables, I need to segment each page into rows and columns and then OCR each piece separately.

关键步骤之一是对齐图像中的文本.

One of the critical steps is to align the text in the image.

例如，这是一个需要对齐的典型页面:

For example, this is a typical page that needs to be aligned:

我发现的一个解决方案是水平涂抹文本(我正在使用skimage.ndimage.morphology.binary_dilation)，并找到使水平方向上的白色像素总和最大化的旋转方式.

A solution I found is to smudge the text horizontally (I'm using skimage.ndimage.morphology.binary_dilation) and find the rotation that maximizes the sum of white pixels along the horizontal dimension.

这可以正常工作，但是每页大约需要8秒钟，考虑到我正在处理的页面量，这太多了.

This works fine, but it takes about 8 seconds per page, which given the volume of pages I am working with, is way too much.

您知道更好，更快的对齐文字的方法吗?

Do you know of a better, faster way of accomplishing aligning the text?

我将scikit-image用于图像处理功能，并通过scipy最大化水平轴上的白色像素数.

I use scikit-image for image processing functions, and scipy to maximize the count of white pixels along the horizontal axis.

这里是我以前处理过的Jupyter笔记本的html视图的链接.该代码使用了我为此项目编写的模块中的某些功能，因此无法单独运行.

Here is a link to an html view of the Jupyter notebook I used to work on this. The code uses some functions from a module I've written for this project so it cannot be run on its own.

链接到笔记本(投递箱): https://db.tt/Mls9Tk8s

Link to notebook (dropbox): https://db.tt/Mls9Tk8s

这里是原始原始图像(下拉框)的链接: https://db.tt/1t9kAt0z

Here is a link to the original raw image (dropbox): https://db.tt/1t9kAt0z

推荐答案

前言:我没有使用python做很多图像处理.我可以给您一个图像处理建议，但是您必须自己在Python中实现它.您需要做的是FFT和极性转换(我认为OpenCV具有

Preface: I haven't done much image processing with python. I can give you an image processing suggestion, but you'll have to implement it in Python yourself. All you need is a FFT and a polar transformation (I think OpenCV has an in-built function for that), so that should be straightforward.

您仅发布了一个示例图像，所以我不知道它是否对其他图像也适用，但是对于此图像，傅立叶变换可能非常有用:只需将图像填充为2的整数倍(例如2048x2048)，您将获得像这样的傅立叶光谱:

You have only posted one sample image, so I don't know if this works as well for other images, but for this image, a Fourier transform can be very useful: Simply pad the image to a nice power of two (e.g. 2048x2048) and you get a Fourier spectrum like this:

我已经发布了有关傅里叶变换的直观说明

I've posted a intuitive explanation of the Fourier transform here, but in short: your image can be represented as a series of sin/cosine waves, and most of those "waves" are parallel or perpendicular to the document orientation. That's why you see a strong frequency response at roughly 0°, 90°, 180° and 270°. To measure the exact angle, you could take a polar transform of the Fourier spectrum:

并简单地采用列均值:

该图中的峰值位置为90.835°，如果我将图像以-90.835模90旋转，则方向看起来不错:

The peak position in that diagram is at 90.835°, and if I rotate the image by -90.835 modulo 90, the orientation looks decent:

就像我说的那样，我没有更多的测试图像，但是它适用于图像的旋转版本.至少它应该缩小搜索空间，以便使用更昂贵的搜索方法.

Like I said, I don't have more test images, but it works for rotated versions of your image. At the very least it should narrow down the search space for a more expensive search method.

注1:FFT速度很快，但是对于较大的图像显然要花费更多的时间.遗憾的是，获得更好的角度分辨率的最佳方法是使用较大的输入图像(即在源图像周围使用更多的白色填充).

Note 1: The FFT is fast, but it obviously takes more time for larger images. And sadly the best way to get a better angle resolution is to use a larger input image (i.e. with more white padding around the source image.)

注2:FFT实际上返回一个图像，其中"DC"(上面的光谱图像的中心)在原点0/0处.但是，如果将旋转属性移到中心，则旋转属性会更清晰，这会使极坐标变换更加容易，所以我只显示了移动后的版本.

Note 2: the FFT actually returns an image where the "DC" (the center in the spectrum image above) is at the origin 0/0. But the rotation property is clearer if you shift it to the center, and it makes the polar transform easier, so I just showed the shifted version.

这篇关于对齐OCR的文字的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

对齐OCR的文字 [英] Align text for OCR

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

对齐OCR的文字 [英] Align text for OCR

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭