从图像中的统一背景中提取页面 [英] Extract a page from a uniform background in an image

查看:16
本文介绍了从图像中的统一背景中提取页面的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我有一张图片,其中有一页文字是在统一的背景上拍摄的,我如何自动检测纸张和背景之间的边界?

我要检测的图像示例如下所示.我将要处理的图像由统一背景上的单个页面组成,它们可以任意角度旋转.

解决方案

一旦将图像转换为灰度,一种简单的方法是通过某个已知值对图像进行阈值处理.这种方法的问题是我们正在应用一个全局阈值,因此如果阈值设置得太高,图像底部的一些纸张将会丢失.如果你把阈值设置得太低,那么你肯定会得到纸张,但你也会包含很多背景像素,并且可能很难通过后期处理来去除这些像素.

我可以建议的一件事是使用自适应阈值算法.过去对我有用的算法是 Bradley-Roth 自适应阈值算法.你可以在这里阅读我不久前评论过的一篇文章:

Bradley 自适应阈值 -- 困惑(问题)p>

但是,如果您想了解它的要点,请提供灰度版本的整体图像首先拍摄图像.积分图像很重要,因为它允许您以 O(1) 复杂度计算窗口内的像素总和.然而,积分图像的计算通常是O(n^2),但你只需要这样做一次.使用积分图像,您扫描大小为 sxs 的像素邻域,并检查平均强度是否小于此 内实际平均值的 t%sxs 窗口然后这是像素分类为背景.如果它更大,那么它被归类为前景的一部分.这是自适应的,因为阈值是使用局部像素邻域完成的,而不是使用全局阈值.

我在这里为您编写了 Bradley-Roth 算法的实现.该算法的默认参数是 s 为图像宽度的 1/8,t 为 15%.因此,您可以这样调用它来调用默认参数:

out = AdaptiveThreshold(im);

im 是输入图像,out 是二进制图像,表示属于前景 (logical true) 或背景 (逻辑错误).您可以使用第二个和第三个输入参数:s 是阈值窗口的大小,t 是我们上面讨论的百分比,可以像这样调用函数:

out = AdaptiveThreshold(im, s, t);

因此,算法的代码如下所示:

function [out] = AdaptiveThreshold(im, s, t)%//输入错误检查%//s的默认值是图片宽度的1/8%//必须确保这是一个整数如果 nargin <= 1, s = round(size(im,2)/8);结尾%//t 的默认值为 15%//t用于判断当前像素是否比当前像素低t%%//特定邻域的平均值如果 nargin <= 2, t = 15;结尾%//参数太少或太多?if nargin == 0, error('参数太少');结尾if nargin >= 4, error('参数太多');结尾%//必要时转换为灰度,然后转换为双精度以确保没有%//饱和度如果大小(im,3)== 3im = 双(rgb2gray(im));elseif 大小(im, 3) == 1我 = 双(我);别的error('不兼容的图像:必须是彩色或灰度图像');结尾%//计算积分图像intImage = cumsum(cumsum(im, 2), 1);%//定义点的网格[行,列] = 大小(im);[X,Y] = meshgrid(1:cols, 1:rows);%//确保 s 是偶数,以便我们能够正确索引图像s = s + mod(s,2);%//访问每个邻域的四个角x1 = X - s/2;x2 = X + s/2;y1 = Y - s/2;y2 = Y + s/2;%//确保没有坐标超出范围x1(x1 <1) = 1;x2(x2 > cols) = cols;y1(y1 <1) = 1;y2(y2 > 行) = 行;%//计算每个邻域有多少像素计数 = (x2 - x1) .* (y2 - y1);%//计算行和列坐标以访问每个角%//积分图像的邻域f1_x = x2;f1_y = y2;f2_x = x2;f2_y = y1 - 1;f2_y(f2_y <1) = 1;f3_x = x1 - 1;f3_x(f3_x <1) = 1;f3_y = y2;f4_x = f3_x;f4_y = f2_y;%//计算每个角的一维线性索引ind_f1 = sub2ind([行列], f1_y, f1_x);ind_f2 = sub2ind([行列], f2_y, f2_x);ind_f3 = sub2ind([rows cols], f3_y, f3_x);ind_f4 = sub2ind([rows cols], f4_y, f4_x);%//计算每个邻域的面积总和 = intImage(ind_f1) - intImage(ind_f2) - intImage(ind_f3) + ...intImage(ind_f4);%//判断求和面积是否超过阈值%//如果没有,则将此输出设置为 0locs = (im .* count) <= (sums * (100 - t)/100);out = true(大小(im));出(位置)=假;结尾

<小时>

当我使用您的图像并设置 s = 500t = 5 时,这是代码,这是我得到的图像:

im = imread('http://i.stack.imgur.com/MEcaz.jpg');out = AdaptiveThreshold(im, 500, 5);显示(出);

您可以看到图像底部白色处有一些虚假的白色像素,并且纸张内部有一些我们需要填充的孔.因此,让我们使用一些形态学并声明一个 15 x 15 正方形的结构元素,执行一个开口以去除嘈杂的像素,然后在我们完成后填充这些洞:

se = strel('square', 15);out = imopen(out, se);out = imfill(out, 'holes');显示(出);

这就是我得到的结果:

还不错吧?现在,如果你真的想看看分割后的图像是什么样子,我们可以使用这个蒙版并将其与原始图像相乘.这样,任何属于纸张的像素都会被保留,而属于背景的像素会消失:

out_colour = bsxfun(@times, im, uint8(out));imshow(out_colour);

我们得到这个:

您必须尝试使用​​这些参数,直到它适合您,但上述参数是我用来使其适用于您向我们展示的特定页面的参数.图像处理就是反复试验,并按照正确的顺序排列处理步骤,直到得到足够好的东西来满足您的目的.

<小时>

图像过滤愉快!

If I have an image, in which there is a page of text shot on a uniform background, how can I auto detect the boundaries between the paper and the background?

An example of the image I want to detect is shown below. The images that I will be dealing with consist of a single page on a uniform background and they can be rotated at any angle.

解决方案

One simple method would be to threshold the image by some known value once you convert the image to grayscale. The problem with that approach is that we are applying a global threshold and so some of the paper at the bottom of the image will be lost if you make the threshold too high. If you make the threshold too low, then you'll certainly get the paper, but you'll include a lot of the background pixels too and it will probably be difficult to remove those pixels with post-processing.

One thing I can suggest is to use an adaptive threshold algorithm. An algorithm that has worked for me in the past is the Bradley-Roth adaptive thresholding algorithm. You can read up about it here on a post I commented on a while back:

Bradley Adaptive Thresholding -- Confused (questions)

However, if you want the gist of it, an integral image of the grayscale version of the image is taken first. The integral image is important because it allows you to calculate the sum of pixels within a window in O(1) complexity. However, the calculation of the integral image is usually O(n^2), but you only have to do that once. With the integral image, you scan neighbourhoods of pixels of size s x s and you check to see if the average intensity is less than t% of the actual average within this s x s window then this is pixel classified as the background. If it's larger, then it's classified as being part of the foreground. This is adaptive because the thresholding is done using local pixel neighbourhoods rather than using a global threshold.

I've coded an implementation of the Bradley-Roth algorithm here for you. The default parameters for the algorithm are s being 1/8th of the width of the image and t being 15%. Therefore, you can just call it this way to invoke the default parameters:

out = adaptiveThreshold(im);

im is the input image and out is a binary image that denotes what belongs to foreground (logical true) or background (logical false). You can play around with the second and third input parameters: s being the size of the thresholding window and t the percentage we talked about above and can call the function like so:

out = adaptiveThreshold(im, s, t);

Therefore, the code for the algorithm looks like this:

function [out] = adaptiveThreshold(im, s, t)

%// Error checking of the input
%// Default value for s is 1/8th the width of the image
%// Must make sure that this is a whole number
if nargin <= 1, s = round(size(im,2) / 8); end

%// Default value for t is 15
%// t is used to determine whether the current pixel is t% lower than the
%// average in the particular neighbourhood
if nargin <= 2, t = 15; end

%// Too few or too many arguments?
if nargin == 0, error('Too few arguments'); end
if nargin >= 4, error('Too many arguments'); end

%// Convert to grayscale if necessary then cast to double to ensure no
%// saturation
if size(im, 3) == 3
    im = double(rgb2gray(im));
elseif size(im, 3) == 1
    im = double(im);
else
    error('Incompatible image: Must be a colour or grayscale image');
end

%// Compute integral image
intImage = cumsum(cumsum(im, 2), 1);

%// Define grid of points
[rows, cols] = size(im);
[X,Y] = meshgrid(1:cols, 1:rows);

%// Ensure s is even so that we are able to index the image properly
s = s + mod(s,2);

%// Access the four corners of each neighbourhood
x1 = X - s/2; x2 = X + s/2;
y1 = Y - s/2; y2 = Y + s/2;

%// Ensure no co-ordinates are out of bounds
x1(x1 < 1) = 1;
x2(x2 > cols) = cols;
y1(y1 < 1) = 1;
y2(y2 > rows) = rows;

%// Count how many pixels there are in each neighbourhood
count = (x2 - x1) .* (y2 - y1);

%// Compute row and column co-ordinates to access each corner of the
%// neighbourhood for the integral image
f1_x = x2; f1_y = y2;
f2_x = x2; f2_y = y1 - 1; f2_y(f2_y < 1) = 1;
f3_x = x1 - 1; f3_x(f3_x < 1) = 1; f3_y = y2;
f4_x = f3_x; f4_y = f2_y;

%// Compute 1D linear indices for each of the corners
ind_f1 = sub2ind([rows cols], f1_y, f1_x);
ind_f2 = sub2ind([rows cols], f2_y, f2_x);
ind_f3 = sub2ind([rows cols], f3_y, f3_x);
ind_f4 = sub2ind([rows cols], f4_y, f4_x);

%// Calculate the areas for each of the neighbourhoods
sums = intImage(ind_f1) - intImage(ind_f2) - intImage(ind_f3) + ...
    intImage(ind_f4);

%// Determine whether the summed area surpasses a threshold
%// Set this output to 0 if it doesn't
locs = (im .* count) <= (sums * (100 - t) / 100);
out = true(size(im));
out(locs) = false;

end


When I use your image and I set s = 500 and t = 5, here's the code and this is the image I get:

im = imread('http://i.stack.imgur.com/MEcaz.jpg');
out = adaptiveThreshold(im, 500, 5);
imshow(out);

You can see that there are some spurious white pixels at the bottom white of the image, and there are some holes we need to fill in inside the paper. As such, let's use some morphology and declare a structuring element that's a 15 x 15 square, perform an opening to remove the noisy pixels, then fill in the holes when we're done:

se = strel('square', 15);
out = imopen(out, se);
out = imfill(out, 'holes');
imshow(out);

This is what I get after all of that:

Not bad eh? Now if you really want to see what the image looks like with the paper segmented, we can use this mask and multiply it with the original image. This way, any pixels that belong to the paper are kept while those that belong to the background go away:

out_colour = bsxfun(@times, im, uint8(out));
imshow(out_colour);

We get this:

You'll have to play around with the parameters until it works for you, but the above parameters were the ones I used to get it working for the particular page you showed us. Image processing is all about trial and error, and putting processing steps in the right sequence until you get something good enough for your purposes.


Happy image filtering!

这篇关于从图像中的统一背景中提取页面的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆