检测文字方向 [英] Detect text orientation

查看:1992
本文介绍了检测文字方向的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何检测图像中的文字方向?

How to detect text orientation in an image?

如果方向是颠倒的(180度)无关紧要..但如果文字行是垂直(90或270度)我需要将它旋转90度。

It doen't matter if the orientation is upside down (180 deg).. But if the text lines is vertical (90 or 270 deg) I need to rotate it 90 degrees.

我希望它没有OCR可能,因为它需要太多资源来处理4个不同方向的OCR相同的图像

I hope its possible without OCR because it takes too much resources to process OCR on 4 different orientations of the same image

原因是我在数码相机或智能手机的图像上使用了scantailor,如果文字方向是90度或270度,有时图像会被裁剪,文本丢失

The reason is that I use scantailor on images from a digital camera or smart phone and if the text orientation is 90 or 270 degree sometimes the image is cropped and text is lost

推荐答案

建议的解决方案(Hough变换)很好(我赞成它)但它可能是CPU密集型的。
这是一个快速的解决方案:

The proposed solution (Hough transform) is good (and I upvoted it) but it might be CPU intensive. Here is a quick dirty solution:


  1. 只需计算水平投影(总和每个像素行中像素的亮度) 。它应该清楚地标记文本行的位置(奖励:你得到文本的分区到行)。做otsu二值化以清楚地看到分区。

  2. 将图像旋转90度并重复步骤1.如果现在文本行垂直于像素行,则投影的结果应该是模糊的混乱(没有明确的文本行分区(奖励:此分区将标记页面的边框,如果文本按列排列,您将获得列的结构)。

  3. 现在你只需要决定哪个投影(步骤1或步骤2)代表真正的文本行。你可以计算ob blobs的数量(一维blob - 所以处理非常快)并选择具有更多blob的那个(除了文本列之外,还有更多的行。或者你可以只计算每个投影向量的标准偏差,然后选择具有较高'std'的那个。这甚至要快得多。

  4. 所有的如果文字清晰地显示在0度或90度,则上面保持。如果它旋转,说比两个投影都要返回10度一塌糊涂。在这种情况下,您可以将文档剪切为5x5件(25件),对每件作品执行步骤1,2,3并根据大多数人选择决定。

  1. Just calculate a horizontal projection (sum the brightness of the pixels in each pixel row). It should clearly mark the positions of the text lines (bonus: you get a partition of the text to lines). Do otsu binarization to clearly see the partition.
  2. Rotate the image by 90 degrees and repeat step 1. If now the text line are perpendicular to the pixel rows the result of the projection should just be a blurry mess (no clear partition of text lines (Bonus: This partition will mark the borders of the page and if the text is arranged in columns, you will get the structure of the columns).
  3. Now You just need to decide which projection (step 1, or step 2) represents real text lines. You can calculate the amount ob blobs (one dimensional blobs - so the processing is extremely fast) and choose the one with more blobs (there are more lines than text columns). Alternatively you can just calculate standard deviation of each projection vector and take the one with the higher 'std'. This is even much faster.
  4. All the above holds if the text goes clearly in 0 degrees or 90 degrees. If it is rotated, say to 10 degrees than both projections will return a mess. In that case you can cut your document to say 5x5 pieces (25 pieces), perform steps 1,2,3 on each piece and choose the decision according to the majority.

注意:所描述的解决方案比Hough变换准确度稍差,但它非常容易实现,速度极快(整个处理速度比仅仅计算图片)+您将免费获得文本行的方向+文档的分区为行和&列。

Note: The described solution is a bit less accurate than Hough transform but it is very easy to implement, extremely fast (Entire processing is faster than just calculating derivatives of the image) + You will get for free the orientation of the text lines + partition of the document into lines & columns.

祝你好运

加法&澄清步骤1 :第一步的说明。假设您的图像宽度为W,高度为H,白色背景上为黑色文本。通过水平投影,您可以对每行中的像素值求和。结果是长度为H的向量。
像素行不包含任何文本部分(因此位于文本行之间)将产生高投影值(因为背景为白色 - 255)。包含字母部分的像素行将产生较低的投影值。
所以现在你有了长度为H的向量,你想看看里面是否有明确的值分区。一组高值,而不是一组低值等(如斑马条纹)。示例:如果文本行之间的距离为20像素,并且每个字母的高度为16像素,则您希望投影向量具有20个大值,后跟16个低数字,后跟20个高值,16个低值等。当然,文档不理想,每个字母都有不同的高度,有些有空洞:(比如't'和'q','i')但是分区的一般规则都有。
相反,如果您将文档旋转90度,现在您的求和与文本行不对齐 - 结果向量将只有大致随机的H值而没有明确的分组。
现在你需要做的就是决定你的结果向量是否有一个好的分区。
快速执行此操作的方法是计算值的标准偏差。如果有分区 - std会很高,否则会更低。
另一种方法是将投影矢量二值化,将其视为1xH大小的新图像,午餐连通分量分析并提取斑点。这非常快,因为斑点是一维的。因此,明亮的斑点将大致标记文本行之间的区域,暗孔标记文本行。如果你的总和是好的(矢量有一个明确的分区) - 你将会有很少的大斑点(斑点的数量〜大致为线的数量和斑点的中值长度〜大致与文本行之间的距离)。但如果你的总和是错误的(文件旋转了90度) - 你会得到许多随机的斑点。连通分量分析需要更多代码(与std相比),但它可以为您提供文本行的位置。线'i'将介于blob'i'和blob'i + 1'

Addition & Clarification to step 1: Explanation of step one. Suppose you have an image of width 'W' and Height 'H' and a black text on white background. By doing a horizontal projection you sum the values of pixels in each row. The result is a vector of length 'H'. Pixel Rows that don't include any parts of text (thus located between the text line) will yield a high projection values (because background is white - 255). Pixel rows that include parts of letters will yield a lower projection values. So now you have the vector of length H and you want to see if there is a clear partition of values inside it. A group of high values, than a group of low values, etc (like zebra stripes). Example: if you have 20 pixels distance between text lines and each letter has a height of 16 pixels you expect the projection vector to have 20 large values followed by 16 low numbers followed by 20 high values, 16 low, etc. Of course the document is not ideal, each letter has a different height, some have holes: (like 't' and 'q', 'i') but the general rule of partition holds. On the contrary if you rotate the document by 90 degrees and now your summation does not align with lines of text - the result vector will just have roughly random 'H' values without clear partition into groups. Now all you need to do is decide whether your result vector has a good partition or not. A quick way to do so is to calculate the standard deviation of the values. If there is a partition - the std will be high, otherwise it will be lower. Another way is to binarize your projection vector, treat it as a new image of size 1xH, lunch connected components analysis and extract the blobs. This is very fast because the blobs are one dimensional. So the bright blobs will mark roughly the areas between text lines and the dark holes mark the text lines. If your summation was good (vector had a clear partition) - you will have few large blobs (amount of blobs ~ roughly as amount of lines and median length of a blob ~ roughly as the distance between text lines). But if your summation was wrong (document rotated by 90 degrees) - you will get many random blobs. The connected component analysis requires a bit more code (compared to std) but it can give you the locations of the lines of texts. Line 'i' will be between blob 'i' and blob 'i+1'

这篇关于检测文字方向的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆