在扫描的文档中分割文本行 [英] Split text lines in scanned document
问题描述
我正在尝试寻找一种方法来打破已被自适应阈值化的扫描文档中的文本行.现在,我将文档的像素值存储为0到255之间的无符号整数,并获取每行像素的平均值,然后根据像素值的平均值是否为0将行划分为多个范围大于250,然后将其取为各行范围的中值.但是,这种方法有时会失败,因为图像上可能会出现黑色斑点.
I am trying to find a way to break the split the lines of text in a scanned document that has been adaptive thresholded. Right now, I am storing the pixel values of the document as unsigned ints from 0 to 255, and I am taking the average of the pixels in each line, and I split the lines into ranges based on whether the average of the pixels values is larger than 250, and then I take the median of each range of lines for which this holds. However, this methods sometimes fails, as there can be black splotches on the image.
执行此任务是否有更抗噪的方法?
Is there a more noise-resistant way to do this task?
这是一些代码. "warped"是原始图像的名称,"cuts"是我要分割图像的位置.
Here is some code. "warped" is the name of the original image, "cuts" is where I want to split the image.
warped = threshold_adaptive(warped, 250, offset = 10)
warped = warped.astype("uint8") * 255
# get areas where we can split image on whitespace to make OCR more accurate
color_level = np.array([np.sum(line) / len(line) for line in warped])
cuts = []
i = 0
while(i < len(color_level)):
if color_level[i] > 250:
begin = i
while(color_level[i] > 250):
i += 1
cuts.append((i + begin)/2) # middle of the whitespace region
else:
i += 1
添加了示例图片
推荐答案
在输入图像中,您需要将文本设置为白色,将背景设置为黑色
From your input image, you need to make text as white, and background as black
然后,您需要计算账单的旋转角度.一种简单的方法是找到所有白点(findNonZero
)的minAreaRect
,您将得到:
You need then to compute the rotation angle of your bill. A simple approach is to find the minAreaRect
of all white points (findNonZero
), and you get:
然后,您可以旋转帐单,以使文本为水平:
Then you can rotate your bill, so that text is horizontal:
现在您可以计算水平投影(reduce
).您可以在每行中取平均值.在直方图上应用阈值th
来解决图像中的一些噪点(这里我使用了0
,即没有噪点).在直方图中,只有背景的行的值为>0
,文本行的值为0
.然后,获取直方图中白色bin的每个连续序列的平均bin坐标.那将是您行的y
坐标:
Now you can compute horizontal projection (reduce
). You can take the average value in each line. Apply a threshold th
on the histogram to account for some noise in the image (here I used 0
, i.e. no noise). Lines with only background will have a value >0
, text lines will have value 0
in the histogram. Then take the average bin coordinate of each continuous sequence of white bins in the histogram. That will be the y
coordinate of your lines:
这里是代码.它使用C ++,但是由于大多数工作都是使用OpenCV函数,因此应该可以轻松转换为Python.至少,您可以将其用作参考:
Here the code. It's in C++, but since most of the work is with OpenCV functions, it should be easy convertible to Python. At least, you can use this as a reference:
#include <opencv2/opencv.hpp>
using namespace cv;
using namespace std;
int main()
{
// Read image
Mat3b img = imread("path_to_image");
// Binarize image. Text is white, background is black
Mat1b bin;
cvtColor(img, bin, COLOR_BGR2GRAY);
bin = bin < 200;
// Find all white pixels
vector<Point> pts;
findNonZero(bin, pts);
// Get rotated rect of white pixels
RotatedRect box = minAreaRect(pts);
if (box.size.width > box.size.height)
{
swap(box.size.width, box.size.height);
box.angle += 90.f;
}
Point2f vertices[4];
box.points(vertices);
for (int i = 0; i < 4; ++i)
{
line(img, vertices[i], vertices[(i + 1) % 4], Scalar(0, 255, 0));
}
// Rotate the image according to the found angle
Mat1b rotated;
Mat M = getRotationMatrix2D(box.center, box.angle, 1.0);
warpAffine(bin, rotated, M, bin.size());
// Compute horizontal projections
Mat1f horProj;
reduce(rotated, horProj, 1, CV_REDUCE_AVG);
// Remove noise in histogram. White bins identify space lines, black bins identify text lines
float th = 0;
Mat1b hist = horProj <= th;
// Get mean coordinate of white white pixels groups
vector<int> ycoords;
int y = 0;
int count = 0;
bool isSpace = false;
for (int i = 0; i < rotated.rows; ++i)
{
if (!isSpace)
{
if (hist(i))
{
isSpace = true;
count = 1;
y = i;
}
}
else
{
if (!hist(i))
{
isSpace = false;
ycoords.push_back(y / count);
}
else
{
y += i;
count++;
}
}
}
// Draw line as final result
Mat3b result;
cvtColor(rotated, result, COLOR_GRAY2BGR);
for (int i = 0; i < ycoords.size(); ++i)
{
line(result, Point(0, ycoords[i]), Point(result.cols, ycoords[i]), Scalar(0, 255, 0));
}
return 0;
}
这篇关于在扫描的文档中分割文本行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!