图像的2D卷积如何工作? [英] How does 2D convolution for images work?

查看:120
本文介绍了图像的2D卷积如何工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这些天我正在研究图像处理,我是这个主题的初学者。我陷入了卷积的问题以及如何为图像实现它。让我简要介绍一下这样的图像卷积的通用公式:

I am studying image processing these days and I am a beginner to the subject. I got stuck on the subject of convolution and how to implement it for images. Let me brief - there is a general formula of convolution for images like so:

x(n1,n2)表示输出图像中的像素,但我不知道 k1 k2 代表什么。实际上,这是想要学习的东西。为了用某种编程语言实现这一点,我需要知道 k1 k2 代表什么。有人可以向我解释这个或者引导我写一篇文章吗?我真的很感激任何帮助。

x(n1,n2) represents a pixel in the output image, but I do not know what k1 and k2 stand for. Actually, this is what would like to learn. In order to implement this in some programming language, I need to know what k1 and k2 stand for. Can someone explain me this to me or lead me to an article? I would be really appreciative of any help.

推荐答案

在这种情况下,卷积处理提取出目标周围的图像像素块图像像素。当您执行图像卷积时,可以使用所谓的掩码点扩散函数内核来执行此操作,这通常比图像本身的大小。

Convolution in this case deals with extracting out patches of image pixels that surround a target image pixel. When you perform image convolution, you perform this with what is known as a mask or point spread function or kernel and this is usually much smaller than the size of the image itself.

对于输出图像中的每个目标图像像素,您可以从输入中获取像素值的邻域,包括输入中相同目标坐标的像素。该邻域的大小与掩模的大小完全相同。此时,您旋转遮罩使其为180度,然后对遮罩中的每个值进行逐个元素的乘法,并使邻域中每个位置的像素值重合。你添加所有这些,这是目标图像中目标像素的输出。

For each target image pixel in the output image, you grab a neighbourhood of pixel values from the input, including the pixel that is at the same target coordinates in the input. The size of this neighbourhood coincides with exactly the same size as the mask. At that point, you rotate the mask so that it's 180 degrees, then do an element-by-element multiplication of each value in the mask with the pixel values that coincide at each location in the neighbourhood. You add all of these up, and that is the output for the target pixel in the target image.

例如,假设我有这个小图像:

For example, let's say I had this small image:

1   2   3   4   5
6   7   8   9  10
11  12 13  14  15
16  17 18  19  20
21  22 23  24  25

让我们说我想进行平均在一个3 x 3的窗口内,所以我的面具都是:

And let's say I wanted to perform an averaging within a 3 x 3 window, so my mask would all be:

    [1  1  1]
1/9*[1  1  1]
    [1  1  1]

执行2D图像卷积,将面具旋转180度仍然给我们相同的面具,所以让我们说我想找到第2行第2列的输出。我要提取的3 x 3邻域是:

To perform 2D image convolution, rotating the mask by 180 degrees still gives us the same mask, and so let's say I wanted to find the output at row 2, column 2. The 3 x 3 neighbourhood I would extract is:

1  2  3
6  7  8
11 12 13

要查找输出,我会将掩码中的每个值乘以邻域的相同位置:

To find the output, I would multiply each value in the mask by the same location of the neighbourhood:

[1  2  3 ]           [1 1 1]
[6  7  8 ]  ** (1/9)*[1 1 1]
[11 12 13]           [1 1 1]

执行逐点乘法并添加值会给我们:

Perform a point by point multiplication and adding the values would give us:

1(1/9) + 2(1/9) + 3(1/9) + 6(1/9) + 7(1/9) + 8(1/9) + 11(1/9) + 12(1/9) + 13(1/9) = 63/9 = 7

输出图像中(2,2)位置的输出为7.

The output at location (2,2) in the output image would be 7.

请记住我没有解决面具超出范围的情况。具体来说,如果我试图找到第1行第1列的输出,那么将有5个掩码超出范围的位置。有很多方法可以解决这个问题。有些人认为外面的像素为零。其他人喜欢复制图像边框,以便将边框像素复制到图像尺寸之外。有些人喜欢使用更复杂的技术来填充图像,例如进行对称填充,其中边框像素是图像内部的镜像反射,或者是从图像的另一侧复制边框像素的圆形填充。

Bear in mind that I didn't tackle the case where the mask would go out of bounds. Specifically, if I tried to find the output at row 1, column 1 for example, there would be five locations where the mask would go out of bounds. There are many ways to handle this. Some people consider those pixels outside to be zero. Other people like to replicate the image border so that the border pixels are copied outside of the image dimensions. Some people like to pad the image using more sophisticated techniques like doing symmetric padding where the border pixels are a mirror reflection of what's inside the image, or a circular padding where the border pixels are copied from the other side of the image.

这超出了本文的范围,但在您的情况下,从最简单的情况开始,当您收集邻域时,任何超出图像范围的像素,设置那些为零。

That's beyond the scope of this post, but in your case, start with the most simplest case where any pixels that go outside the bounds of the image when you're collecting neighbourhoods, set those to zero.

现在, k1 k2 是什么意思? k1 k2 表示相对于邻域和掩码中心的偏移。请注意, n1 - k1 n2 - k2 在总和中很重要。输出位置由 n1 n2 表示。因此, n1 - k1 n2 - k2 是关于此中心的抵消在水平意义上 n1 - k1 和垂直意义 n2 - k2 。如果我们有 3 x 3 掩码,则中心将是 k1 = k2 = 0 。左上角是 k1 = k2 = -1 。右下角将是 k1 = k2 = 1 。他们之所以走向无限,是因为我们需要确保覆盖掩码中的所有元素。面具的大小是有限的,这只是为了确保我们覆盖所有的面具元素。因此,上述总和简化为我之前讨论的逐点总结。

Now, what does k1 and k2 mean? k1 and k2 denote the offset with respect to the centre of the neighbourhood and mask. Notice that the n1 - k1 and n2 - k2 are important in the sum. The output position is denoted by n1 and n2. Therefore, n1 - k1 and n2 - k2 are the offsets with respect to this centre in both the horizontal sense n1 - k1 and the vertical sense n2 - k2. If we had a 3 x 3 mask, the centre would be k1 = k2 = 0. The top-left corner would be k1 = k2 = -1. The bottom right corner would be k1 = k2 = 1. The reason why they go to infinity is because we need to make sure we cover all elements in the mask. Masks are finite in size so that's just to ensure that we cover all of the mask elements. Therefore, the above sum simplifies to that point by point summation I was talking about earlier.

这里有一个更好的说明mask是一个垂直Sobel滤镜,可以在图像中找到垂直渐变:

Here's a better illustration where the mask is a vertical Sobel filter which finds vertical gradients in an image:

来源: http://blog.saush.com/2011/04/20/edge-detection-with-the-sobel-operator -in-ruby /

如您所见,对于目标图像中的每个输出像素,我们来看一个邻域在输入图像中相同空间位置的像素,在这种情况下为3 x 3,我们通过掩模和邻域之间的元素和执行加权元素,并将输出像素设置为这些加权元素的总和。请记住,此示例不会将遮罩旋转180度,但这就是您在卷积时所做的事情。

As you can see, for each output pixel in the target image, we take a look at a neighbourhood of pixels in the same spatial location in the input image, and that's 3 x 3 in this case, we perform a weighted element by element sum between the mask and the neighbourhood and we set the output pixel to be the total sum of these weighted elements. Bear in mind that this example does not rotate the mask by 180 degrees, but that's what you do when it comes to convolution.

希望这会有所帮助!

这篇关于图像的2D卷积如何工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆