了解霍夫变换 [英] Understanding Hough Transform

查看:129
本文介绍了了解霍夫变换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图理解Hough变换的MATLAB代码.

这张照片中有些项目对我来说很清楚

  1. binary_imageinput_image的单色版本.
  2. hough_lines是包含图像中检测到的线条的向量.我看到已经检测到四行.
  3. T包含图像的(ϴ, ρ)空间中的theta.
  4. R在图像的(ϴ, ρ)空间中包含rhos.

我有以下问题,

  1. 为什么在应用霍夫变换之前先旋转图像?
  2. H中的条目代表什么?
  3. 为什么尺寸为45x180的H(霍夫矩阵)?这个大小来自哪里?
  4. 为什么T的大小为1x180?这个大小来自哪里?
  5. 为什么R的尺寸为1x45?这个大小来自哪里?
  6. P中的条目代表什么?它们是(x, y)还是(ϴ, ρ)?

    29 162
    29 165
    28 170
    21  5
    29 158
    

  7. 为什么将值5传递给houghpeaks()?
  8. ceil(0.3*max(H(:)))背后的逻辑是什么?

相关源代码

%   Read image into workspace.
input_image  = imread('Untitled.bmp');

%Rotate the image.
rotated_image = imrotate(input_image,33,'crop');

% convert rgb to grascale
rotated_image = rgb2gray(rotated_image);

%Create a binary image.
binary_image = edge(rotated_image,'canny');

%Create the Hough transform using the binary image.
[H,T,R] = hough(binary_image);

%Find peaks in the Hough transform of the image.
P  = houghpeaks(H,5,'threshold',ceil(0.3*max(H(:))));

%Find lines
hough_lines = houghlines(binary_image,T,R,P,'FillGap',5,'MinLength',7);    

% Plot the detected lines
figure, imshow(rotated_image), hold on
max_len = 0;

for k = 1:length(hough_lines)
   xy = [hough_lines(k).point1; hough_lines(k).point2];
   plot(xy(:,1),xy(:,2),'LineWidth',2,'Color','green');

   % Plot beginnings and ends of lines
   plot(xy(1,1),xy(1,2),'x','LineWidth',2,'Color','yellow');
   plot(xy(2,1),xy(2,2),'x','LineWidth',2,'Color','red');

   % Determine the endpoints of the longest line segment
   len = norm(hough_lines(k).point1 - hough_lines(k).point2);
   if ( len > max_len)
      max_len = len;
      xy_long = xy;
   end
end

% Highlight the longest line segment by coloring it cyan.
plot(xy_long(:,1),xy_long(:,2),'LineWidth',2,'Color','cyan');

解决方案

这些是一些很好的问题.这是我为您提供的答案:

为什么在应用霍夫变换之前先旋转图像?

我不相信这是MATLAB的官方示例". 我只是快速浏览了该功能的文档页面.我相信您是从另一个我们无权访问的网站上拉取的.在任何情况下,通常都不需要在使用霍夫变换之前旋转图像.霍夫变换的目的是在图像的任何方向上找到线条.旋转它们不应影响结果.但是,如果我猜测旋转是一种先发制人的措施,因为示例图像"中的线条最有可能沿顺时针方向以33度角定向.进行反向旋转会使线或多或少地笔直.

H中的条目代表什么?

H是所谓的累加器矩阵.在我们了解H的目的以及如何解释矩阵之前,您需要了解霍夫变换的工作原理.使用霍夫变换,我们首先对图像执行边缘检测.在您的情况下,可以使用Canny边缘检测器完成此操作.如果您还记得霍夫变换,我们可以使用以下关系参数化一条线:

rho = x*cos(theta) + y*sin(theta)

xy是图像中的点,最通常地,它们是边缘点. theta是从原点开始的线与通过边缘点绘制的线的交点形成的角度. rho是从原点到以theta角通过(x, y)绘制的这条线的垂直距离.

请注意,该方程式可以产生位于(x, y)的许多线的无穷大,因此将可能角度的总数合并或离散化到预定义的数量是很常见的.默认情况下,MATLAB假定存在 180 个可能的角度,范围为[-90, 90),采样因子为1.因此,[-90, -89, -88, ... , 88, 89].通常要做的是针对每个边缘点,搜索预定义数量的角度,确定对应的rho是什么.之后,我们计算您看到每个rhotheta对的次数.这是从Wikipedia中提取的一个简单示例:

来源:维基百科:霍夫变换

在这里,我们看到三个黑点沿着一条直线.理想情况下,霍夫变换应确定这些黑点一起形成一条直线.为了使您对计算有所了解,请看一下30度的示例.进行更早的咨询,当我们延伸一条线时,从原点到该线的角度穿过每个点为30度时,我们会发现从该线到原点的垂直距离.

现在有趣的是,如果您看到每个点的垂直距离均为60度,则该距离大约等于80像素.在三点中的每一个点上都看到这对rhotheta是霍夫变换背后的驱动力.另外,上述公式的好处是,它将隐式为您找到垂直距离.

霍夫变换的过程非常简单.假设我们有一个边缘检测图像I和一组角度theta:

For each point (x, y) in the image:
    For each angle A in the angles theta:
        Substitute theta into: rho = x*cos(theta) + y*sin(theta)
        Solve for rho to find the perpendicular distance
        Remember this rho and theta and count up the number of times you see this by 1

因此,理想情况下,如果我们的边缘点遵循一条直线,则应该看到一个rhotheta对,其中我们看到该对的次数相对较高. 这是累加器矩阵H 的目的.行表示唯一的rho值,列表示唯一的theta值.

下面是一个示例:

来源: Google专利

因此,使用来自此矩阵的示例,该示例位于25-30之间的thetarho为4-4.5,我们发现有8个边缘点可以通过给定范围对.

请注意,rho的范围也是无限多个值,因此,您不仅需要限制您拥有的rho的范围,而且还必须使用采样间隔离散化rho. MATLAB中的默认值为1.因此,如果计算rho值,它将不可避免地具有浮点值,因此您将删除小数精度以确定最终的rho. 对于上面的示例,rho分辨率为0.5,因此这意味着,例如,如果您计算的rho值介于2到2.5之间,则它位于第一列.还要注意,theta值以5的间隔进行装仓.传统上,您会以theta采样间隔1来计算霍夫变换,然后将这些装箱合并在一起.但是,对于默认的MATLAB,bin大小为1.此累加器矩阵告诉您多少次边缘点适合特定的rhotheta组合.因此,如果我们看到许多点映射到特定的rhotheta值,则这是在此处检测到并由rho = x*cos(theta) + y*sin(theta)定义的线的巨大潜力.

为什么尺寸为45x180的H(霍夫矩阵)?这个大小从哪里来?

这是上一点的结果.请注意,我们期望从原点到图像中任何点的最大距离以图像的对角线为界.这是有道理的,因为从左上角到右下角或从左下角到右上角将为您提供图像中预期的最大距离.通常,将其定义为D = sqrt(rows^2 + cols^2),其中rowscols是图像的行和列.

对于MATLAB默认值,rho的范围应使它从-round(D)round(D)的步长为1.因此,您的行和列均为16,因此D = sqrt(16^2 + 16^2) = 22.45 ...因此D的范围会从-2222,因此将导致45个唯一的rho值.请记住,theta的默认分辨率从[-90, 90)(步长为1)开始,产生180个唯一的角度值.因此,累加器矩阵中有45行180列,因此H45 x 180.

为什么T的大小为1x180?这个大小从哪里来?

这是一个数组,它告诉您霍夫变换中使用的所有角度.该数组应该是从-9089的数组,步长为1.

为什么R的大小为1x45?这个大小从哪里来?

这是一个数组,它告诉您霍夫变换中使用的所有rho值.这应该是一个从-2222的步长为1的数组.


您应该摆脱的是,H中的每个值确定我们看到多少对特定的rhotheta对,例如对于R(i) <= rho < R(i + 1)T(j) <= theta < T(j + 1),其中跨度从1到44,而j跨度从1到179,这确定了在先前定义的rhotheta特定范围内,我们看到边缘点的次数.


P中的条目代表什么?他们是(x, y)还是(ϴ, ρ)?

Phoughpeaks函数的输出.基本上,这通过查找累加器矩阵中的峰值发生在哪里来确定可能的行.这样可为您提供P中出现峰值的实际物理位置.这些位置是:

29 162
29 165
28 170
21  5
29 158

每行为您提供通往生成检测到的行所需的rhotheta参数的网关.具体来说,第一行的特征是rho = R(29)theta = T(162).第二行以rho = R(29)theta = T(165)等为特征.要回答您的问题,P中的值都不是(x, y)(ρ, ϴ).它们表示P中交叉引用RT的物理位置,它将为您提供表征图像中检测到的线的参数.

为什么将值5传递给houghpeaks()?

houghpeaks中多余的5返回理想情况下要检测的总行数.我们可以看到P是5行,对应于5行.如果找不到5行,那么MATLAB将返回尽可能多的行.

ceil(0.3*max(H(:)))背后的逻辑是什么?

其背后的逻辑是,如果要确定累加器矩阵中的峰值,则必须定义一个最小阈值,该阈值将告诉您特定的rhotheta组合是否将被视为有效行.将该阈值设置得太低会报告很多错误行,而将该阈值设置得太高则会错过很多行.他们决定在这里做的是在累加器矩阵中找到最大的bin数,取其中的30%,取数学上限,并且累加器矩阵中的任何大于此数量的值都将成为候选行.


希望这会有所帮助!

I am trying to understand MATLAB's code for the Hough Transform.

Some items are clear to me in this picture,

  1. binary_image is the monochrome version of input_image.
  2. hough_lines is a vector containing detected lines in the image. I see that, four lines have been detected.
  3. T contain the thetas in the (ϴ, ρ) space of the image.
  4. R contain the rhos in the (ϴ, ρ) space of the image.

I have the following questions,

  1. Why is the image rotated before applying Hough Transform?
  2. What do the entries in H represent?
  3. Why is H(Hough Matrix) of size 45x180? Where does this size come from?
  4. Why is T of size 1x180? Where does this size come from?
  5. Why is R of size 1x45? Where does this size come from?
  6. What do the entries in P represent? Are they (x, y) or (ϴ, ρ) ?

    29 162
    29 165
    28 170
    21  5
    29 158
    

  7. Why is the value 5 passed into houghpeaks()?
  8. What is the logic behind ceil(0.3*max(H(:)))?

Relevant source code

%   Read image into workspace.
input_image  = imread('Untitled.bmp');

%Rotate the image.
rotated_image = imrotate(input_image,33,'crop');

% convert rgb to grascale
rotated_image = rgb2gray(rotated_image);

%Create a binary image.
binary_image = edge(rotated_image,'canny');

%Create the Hough transform using the binary image.
[H,T,R] = hough(binary_image);

%Find peaks in the Hough transform of the image.
P  = houghpeaks(H,5,'threshold',ceil(0.3*max(H(:))));

%Find lines
hough_lines = houghlines(binary_image,T,R,P,'FillGap',5,'MinLength',7);    

% Plot the detected lines
figure, imshow(rotated_image), hold on
max_len = 0;

for k = 1:length(hough_lines)
   xy = [hough_lines(k).point1; hough_lines(k).point2];
   plot(xy(:,1),xy(:,2),'LineWidth',2,'Color','green');

   % Plot beginnings and ends of lines
   plot(xy(1,1),xy(1,2),'x','LineWidth',2,'Color','yellow');
   plot(xy(2,1),xy(2,2),'x','LineWidth',2,'Color','red');

   % Determine the endpoints of the longest line segment
   len = norm(hough_lines(k).point1 - hough_lines(k).point2);
   if ( len > max_len)
      max_len = len;
      xy_long = xy;
   end
end

% Highlight the longest line segment by coloring it cyan.
plot(xy_long(:,1),xy_long(:,2),'LineWidth',2,'Color','cyan');

解决方案

Those are some good questions. Here are my answers for you:

Why is the image rotated before applying Hough Transform?

This I don't believe is MATLAB's "official example". I just took a quick look at the documentation page for the function. I believe you pulled this from another website that we don't have access to. In any case, in general it is not necessary for you to rotate the images prior to using the Hough Transform. The goal of the Hough Transform is to find lines in the image in any orientation. Rotating them should not affect the results. However, if I were to guess the rotation was performed as a preemptive measure because the lines in the "example image" were most likely oriented at a 33 degree angle clockwise. Performing the reverse rotation would make the lines more or less straight.

What do the entries in H represent?

H is what is known as an accumulator matrix. Before we get into what the purpose of H is and how to interpret the matrix, you need to know how the Hough Transform works. With the Hough transform, we first perform an edge detection on the image. This is done using the Canny edge detector in your case. If you recall the Hough Transform, we can parameterize a line using the following relationship:

rho = x*cos(theta) + y*sin(theta)

x and y are points in the image and most customarily they are edge points. theta would be the angle made from the intersection of a line drawn from the origin meeting with the line drawn through the edge point. rho would be the perpendicular distance from the origin to this line drawn through (x, y) at the angle theta.

Note that the equation can yield infinity many lines located at (x, y) so it's common to bin or discretize the total number of possible angles to a predefined amount. MATLAB by default assumes there are 180 possible angles that range from [-90, 90) with a sampling factor of 1. Therefore [-90, -89, -88, ... , 88, 89]. What you generally do is for each edge point, you search over a predefined number of angles, determine what the corresponding rho is. After, we count how many times you see each rho and theta pair. Here's a quick example pulled from Wikipedia:

Source: Wikipedia: Hough Transform

Here we see three black dots that follow a straight line. Ideally, the Hough Transform should determine that these black dots together form a straight line. To give you a sense of the calculations, take a look at the example at 30 degrees. Consulting earlier, when we extend a line where the angle made from the origin to this line is 30 degrees through each point, we find the perpendicular distance from this line to the origin.

Now what's interesting is if you see the perpendicular distance shown at 60 degrees for each point, the distance is more or less the same at about 80 pixels. Seeing this rho and theta pair for each of the three points is the driving force behind the Hough Transform. Also, what's nice about the above formula is that it will implicitly find the perpendicular distance for you.

The process of the Hough Transform is very simple. Suppose we have an edge detected image I and a set of angles theta:

For each point (x, y) in the image:
    For each angle A in the angles theta:
        Substitute theta into: rho = x*cos(theta) + y*sin(theta)
        Solve for rho to find the perpendicular distance
        Remember this rho and theta and count up the number of times you see this by 1

So ideally, if we had edge points that follow a straight line, we should see a rho and theta pair where the count of how many times we see this pair is relatively high. This is the purpose of the accumulator matrix H. The rows denote a unique rho value and the columns denote a unique theta value.

An example of this is shown below:

Source: Google Patents

Therefore using an example from this matrix, located at theta between 25 - 30 with a rho of 4 - 4.5, we have found that there are 8 edge points that would be characterized by a line given this rho, theta range pair.

Note that the range of rho is also infinitely many values so you need to not only restrict the range of rho that you have, but you also have to discretize the rho with a sampling interval. The default in MATLAB is 1. Therefore, if you calculate a rho value it will inevitably have floating point values, so you remove the decimal precision to determine the final rho. For the above example the rho resolution is 0.5, so that means that for example if you calculated a rho value that falls between 2 to 2.5, it falls in the first column. Also note that the theta values are binned in intervals of 5. You traditionally would compute the Hough Transform with a theta sampling interval of 1, then you merge the bins together. However for the defaults of MATLAB, the bin size is 1. This accumulator matrix tells you how many times an edge point fits a particular rho and theta combination. Therefore, if we see many points that get mapped to a particular rho and theta value, this is a great potential for a line to be detected here and that is defined by rho = x*cos(theta) + y*sin(theta).

Why is H(Hough Matrix) of size 45x180? Where does this size come from?

This is a consequence of the previous point. Take note that the largest distance we would expect from the origin to any point in the image is bounded by the diagonal of the image. This makes sense because going from the top left corner to the bottom right corner, or from the bottom left corner to the top right corner would give you the greatest distance expected in the image. In general, this is defined as D = sqrt(rows^2 + cols^2) where rows and cols are the rows and columns of the image.

For the MATLAB defaults, the range of rho is such that it spans from -round(D) to round(D) in steps of 1. Therefore, your rows and columns are both 16, and so D = sqrt(16^2 + 16^2) = 22.45... and so the range of D will span from -22 to 22 and hence this results in 45 unique rho values. Remember that the default resolution of theta goes from [-90, 90) (with steps of 1) resulting in 180 unique angle values. Going with this, we have 45 rows and 180 columns in the accumulator matrix and hence H is 45 x 180.

Why is T of size 1x180? Where does this size come from?

This is an array that tells you all of the angles that were being used in the Hough Transform. This should be an array going from -90 to 89 in steps of 1.

Why is R of size 1x45? Where does this size come from?

This is an array that tells you all of the rho values that were being used in the Hough Transform. This should be an array that spans from -22 to 22 in steps of 1.


What you should take away from this is that each value in H determines how many times we have seen a particular pair of rho and theta such that for R(i) <= rho < R(i + 1) and T(j) <= theta < T(j + 1), where i spans from 1 to 44 and j spans from 1 to 179, this determines how many times we see edge points for a particular range of rho and theta defined previously.


What do the entries in P represent? Are they (x, y) or (ϴ, ρ)?

P is the output of the houghpeaks function. Basically, this determines what the possible lines are by finding where the peaks in the accumulator matrix happen. This gives you the actual physical locations in P where there is a peak. These locations are:

29 162
29 165
28 170
21  5
29 158

Each row gives you a gateway to the rho and theta parameters required to generate the detected line. Specifically, the first line is characterized by rho = R(29) and theta = T(162). The second line is characterized by rho = R(29) and theta = T(165) etc. To answer your question, the values in P are neither (x, y) or (ρ, ϴ). They represent the physical locations in P where cross-referencing R and T, it would give you the parameters to characterize the line that was detected in the image.

Why is the value 5 passed into houghpeaks()?

The extra 5 in houghpeaks returns the total number of lines you'd like to detect ideally. We can see that P is 5 rows, corresponding to 5 lines. If you can't find 5 lines, then MATLAB will return as many lines possible.

What is the logic behind ceil(0.3*max(H(:)))?

The logic behind this is that if you want to determine peaks in the accumulator matrix, you have to define a minimum threshold that would tell you whether the particular rho and theta combination would be considered a valid line. Making this threshold too low would report a lot of false lines and making this threshold too high misses a lot of lines. What they decided to do here was find the largest bin count in the accumulator matrix, take 30% of that, take the mathematical ceiling and any values in the accumulator matrix that are larger than this amount, those would be candidate lines.


Hope this helps!

这篇关于了解霍夫变换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆