对齐OCR的文字 [英] Align text for OCR

查看:118
本文介绍了对齐OCR的文字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在根据历史记录创建数据库,这些历史记录作为我从书籍中拍摄的页面(超过100K页).我在OCR每页之前写了一些python代码来做一些图像处理.由于这些书中的数据不是以格式正确的表格提供的,因此我需要将每一页分为行和列,然后分别对每部分进行OCR.

I am creating a database from historical records which I have as photographed pages from books (+100K pages). I wrote some python code to do some image processing before I OCR each page. Since the data in these books does not come in well formatted tables, I need to segment each page into rows and columns and then OCR each piece separately.

关键步骤之一是对齐图像中的文本.

One of the critical steps is to align the text in the image.

例如,这是一个需要对齐的典型页面:

For example, this is a typical page that needs to be aligned:

我发现的一个解决方案是水平涂抹文本(我正在使用skimage.ndimage.morphology.binary_dilation),并找到使水平方向上的白色像素总和最大化的旋转方式.

A solution I found is to smudge the text horizontally (I'm using skimage.ndimage.morphology.binary_dilation) and find the rotation that maximizes the sum of white pixels along the horizontal dimension.

这可以正常工作,但是每页大约需要8秒钟,考虑到我正在处理的页面量,这太多了.

This works fine, but it takes about 8 seconds per page, which given the volume of pages I am working with, is way too much.

您知道更好,更快的对齐文字的方法吗?

Do you know of a better, faster way of accomplishing aligning the text?

我将scikit-image用于图像处理功能,并通过scipy最大化水平轴上的白色像素数.

I use scikit-image for image processing functions, and scipy to maximize the count of white pixels along the horizontal axis.

这里是我以前处理过的Jupyter笔记本的html视图的链接.该代码使用了我为此项目编写的模块中的某些功能,因此无法单独运行.

Here is a link to an html view of the Jupyter notebook I used to work on this. The code uses some functions from a module I've written for this project so it cannot be run on its own.

链接到笔记本(投递箱): https://db.tt/Mls9Tk8s

Link to notebook (dropbox): https://db.tt/Mls9Tk8s

这里是原始原始图像(下拉框)的链接: https://db.tt/1t9kAt0z

Here is a link to the original raw image (dropbox): https://db.tt/1t9kAt0z

推荐答案

前言:我没有使用python做很多图像处理.我可以给您一个图像处理建议,但是您必须自己在Python中实现它.您需要做的是FFT和极性转换(我认为OpenCV具有

Preface: I haven't done much image processing with python. I can give you an image processing suggestion, but you'll have to implement it in Python yourself. All you need is a FFT and a polar transformation (I think OpenCV has an in-built function for that), so that should be straightforward.

您仅发布了一个示例图像,所以我不知道它是否对其他图像也适用,但是对于此图像,傅立叶变换可能非常有用:只需将图像填充为2的整数倍(例如2048x2048),您将获得像这样的傅立叶光谱:

You have only posted one sample image, so I don't know if this works as well for other images, but for this image, a Fourier transform can be very useful: Simply pad the image to a nice power of two (e.g. 2048x2048) and you get a Fourier spectrum like this:

我已经发布了有关傅里叶变换的直观说明

I've posted a intuitive explanation of the Fourier transform here, but in short: your image can be represented as a series of sin/cosine waves, and most of those "waves" are parallel or perpendicular to the document orientation. That's why you see a strong frequency response at roughly 0°, 90°, 180° and 270°. To measure the exact angle, you could take a polar transform of the Fourier spectrum:

并简单地采用列均值:

该图中的峰值位置为90.835°,如果我将图像以-90.835模90旋转,则方向看起来不错:

The peak position in that diagram is at 90.835°, and if I rotate the image by -90.835 modulo 90, the orientation looks decent:

就像我说的那样,我没有更多的测试图像,但是它适用于图像的旋转版本.至少它应该缩小搜索空间,以便使用更昂贵的搜索方法.

Like I said, I don't have more test images, but it works for rotated versions of your image. At the very least it should narrow down the search space for a more expensive search method.

注1:FFT速度很快,但是对于较大的图像显然要花费更多的时间.遗憾的是,获得更好的角度分辨率的最佳方法是使用较大的输入图像(即在源图像周围使用更多的白色填充).

Note 1: The FFT is fast, but it obviously takes more time for larger images. And sadly the best way to get a better angle resolution is to use a larger input image (i.e. with more white padding around the source image.)

注2:FFT实际上返回一个图像,其中"DC"(上面的光谱图像的中心)在原点0/0处.但是,如果将旋转属性移到中心,则旋转属性会更清晰,这会使极坐标变换更加容易,所以我只显示了移动后的版本.

Note 2: the FFT actually returns an image where the "DC" (the center in the spectrum image above) is at the origin 0/0. But the rotation property is clearer if you shift it to the center, and it makes the polar transform easier, so I just showed the shifted version.

这篇关于对齐OCR的文字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆