检测文本方向 [英] Detect text orientation

查看:41
本文介绍了检测文本方向的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何检测图像中的文本方向?

How to detect text orientation in an image?

方向是否颠倒(180 度)无关紧要.但如果文本行是垂直的(90 或 270 度),我需要将其旋转 90 度.

It doen't matter if the orientation is upside down (180 deg).. But if the text lines is vertical (90 or 270 deg) I need to rotate it 90 degrees.

我希望不用 OCR 也能实现,因为在同一图像的 4 个不同方向上处理 OCR 需要太多资源

I hope its possible without OCR because it takes too much resources to process OCR on 4 different orientations of the same image

原因是我对来自数码相机或智能手机的图像使用 scantailor,如果文本方向为 90 或 270 度,有时图像会被裁剪而文本丢失

The reason is that I use scantailor on images from a digital camera or smart phone and if the text orientation is 90 or 270 degree sometimes the image is cropped and text is lost

推荐答案

提议的解决方案(霍夫变换)很好(我赞成),但它可能会占用大量 CPU.这是一个快速的肮脏解决方案:

The proposed solution (Hough transform) is good (and I upvoted it) but it might be CPU intensive. Here is a quick dirty solution:

  1. 只需计算水平投影(对每个像素行中像素的亮度求和).它应该清楚地标记文本行的位置(奖励:您可以将文本分割为行).进行 otsu 二值化以清楚地看到分区.
  2. 将图像旋转 90 度并重复步骤 1.如果现在文本行与像素行垂直,则投影结果应该只是一团模糊(没有清晰的文本行分区(奖励:此分区将标记页面的边框,如果文本按列排列,您将获得列的结构).
  3. 现在您只需要决定哪个投影(第 1 步或第 2 步)代表真正的文本行.您可以计算 ob blob 的数量(一维 blob - 因此处理速度非常快)并选择 blob 多的那个(行数多于文本列).或者,您可以只计算每个投影向量的标准偏差,并采用std"较高的那个.这甚至更快.
  4. 如果文本以 0 度或 90 度清晰显示,则上述所有内容都成立.如果它被旋转了 10 度,那么两个投影都会变得一团糟.在这种情况下,您可以将文档剪成 5x5 块(25 块),对每块执行 1、2、3 步,然后根据多数选择决定.

注意:所描述的解决方案不如霍夫变换准确,但它很容易实现,速度非常快(整个处理比仅仅计算图像的导数要快)+你会得到免费文本行的方向 + 将文档分割成行 &列.

Note: The described solution is a bit less accurate than Hough transform but it is very easy to implement, extremely fast (Entire processing is faster than just calculating derivatives of the image) + You will get for free the orientation of the text lines + partition of the document into lines & columns.

祝你好运

加法&对第 1 步的说明:对第 1 步的说明.假设您有一张宽度为W"、高度为H"的图像以及白色背景上的黑色文本.通过进行水平投影,您可以对每行中的像素值求和.结果是长度为H"的向量.不包含任何文本部分(因此位于文本行之间)的像素行将产生高投影值(因为背景为白色 - 255).包含部分字母的像素行将产生较低的投影值.所以现在你有了长度为 H 的向量,你想看看里面是否有明确的值分区.一组高值,而不是一组低值等(如斑马条纹).示例:如果文本行之间的距离为 20 像素,并且每个字母的高度为 16 像素,则您希望投影向量具有 20 个大值,后跟 16 个低值,然后是 20 个高值,16 个低值,等等.当然是文档不理想,每个字母都有不同的高度,有些有孔:(如t"和q"、i")但分区的一般规则成立.相反,如果您将文档旋转 90 度并且现在您的求和与文本行不对齐 - 结果向量将仅具有大致随机的H"值,而没有明确的分组.现在你需要做的就是决定你的结果向量是否有一个好的分区.一个快速的方法是计算值的标准偏差.如果有分区 - 标准会高,否则会低.另一种方法是将您的投影向量二值化,将其视为大小为 1xH 的新图像,进行午餐连接组件分析并提取斑点.这非常快,因为 blob 是一维的.因此,明亮的斑点将大致标记文本行之间的区域,而暗孔则标记文本行.如果你的总和很好(向量有一个清晰的分区) - 你将有几个大的斑点(斑点的数量~大致相当于行的数量和斑点的中位数~大致相当于文本行之间的距离).但是如果你的总和是错误的(文档旋转了 90 度) - 你会得到很多随机的 blob.连通分量分析需要更多的代码(与 std 相比),但它可以为您提供文本行的位置.行 'i' 将在 blob 'i' 和 blob 'i+1' 之间

Addition & Clarification to step 1: Explanation of step one. Suppose you have an image of width 'W' and Height 'H' and a black text on white background. By doing a horizontal projection you sum the values of pixels in each row. The result is a vector of length 'H'. Pixel Rows that don't include any parts of text (thus located between the text line) will yield a high projection values (because background is white - 255). Pixel rows that include parts of letters will yield a lower projection values. So now you have the vector of length H and you want to see if there is a clear partition of values inside it. A group of high values, than a group of low values, etc (like zebra stripes). Example: if you have 20 pixels distance between text lines and each letter has a height of 16 pixels you expect the projection vector to have 20 large values followed by 16 low numbers followed by 20 high values, 16 low, etc. Of course the document is not ideal, each letter has a different height, some have holes: (like 't' and 'q', 'i') but the general rule of partition holds. On the contrary if you rotate the document by 90 degrees and now your summation does not align with lines of text - the result vector will just have roughly random 'H' values without clear partition into groups. Now all you need to do is decide whether your result vector has a good partition or not. A quick way to do so is to calculate the standard deviation of the values. If there is a partition - the std will be high, otherwise it will be lower. Another way is to binarize your projection vector, treat it as a new image of size 1xH, lunch connected components analysis and extract the blobs. This is very fast because the blobs are one dimensional. So the bright blobs will mark roughly the areas between text lines and the dark holes mark the text lines. If your summation was good (vector had a clear partition) - you will have few large blobs (amount of blobs ~ roughly as amount of lines and median length of a blob ~ roughly as the distance between text lines). But if your summation was wrong (document rotated by 90 degrees) - you will get many random blobs. The connected component analysis requires a bit more code (compared to std) but it can give you the locations of the lines of texts. Line 'i' will be between blob 'i' and blob 'i+1'

这篇关于检测文本方向的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆