HOG:在对比度归一化步骤中做了什么? [英] HOG: What is done in the contrast-normalization step?

查看:1343
本文介绍了HOG:在对比度归一化步骤中做了什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据HOG过程,如在用于人类检测的定向梯度的直方图(参见下面的链接)中所描述的,对比度归一化步骤在合并和加权投票之后完成。 p>

我不明白什么 - 如果我已经计算了细胞的加权梯度,那么图像对比的标准化如何帮助我呢?



据我所知,对比正常化是对原始图像进行的,而对于计算梯度,我已经计算了ORIGINAL图像的X,Y导数。



有没有什么我不太明白的地方?

如果我把对比度标准化并让它生效,

我应该规范化单元格的值吗?



HOG中的规范化不是关于对比度, (每个单元格中的单元格数)?



链接到论文:
http://lear.inrialpes.fr/people/triggs/pubs/Dalal-cvpr05.pdf



整个HOG提取过程是通过对每个块的局部直方图进行归一化来实现的。在此解释: http://www.geocities.ws/talh_davidc/#cst_extract p>

如果您的直方图真的包含每个方向的大小总和,那么当您对块直方图进行归一化时,实际上会归一化该块中的对比度。



直方图这个词在这里很混乱,因为你不计算有多少像素具有方向k ,而是将这些像素的大小相加。因此,您可以在计算块的向量之后,或者甚至在计算完整个向量之后,将对比度归一化,假设您知道向量中的哪个索引块开始和块结束。



由于我的理解,算法的步骤 - 对我95%的成功率:


  1. 定义以下参数(在此示例中,参数类似于人类检测论文的HOG):




    • 单元格大小以像素为单位(例如6x6)

    • 单元格中的块大小(例如3x3 ==>表示像素单位为18x18)

    • 在这个例子中,由于单元格宽度和单元格高度是偶数(6个像素),使得块宽度和高度也是偶数)

    • 检测窗口大小。大小必须可以除以没有余数的块大小的一半(因此可以将块精确地放置在50%重叠内)。例如,块宽度是18个像素,因此窗口宽度必须是9的乘法(例如9,18,27,36,...)。窗口高度相同。在我们的示例中,窗口宽度为63像素,窗口高度为126像素。


  2. p>


    • 使用向量[-1 0 1]卷积计算X差值

    • 计算Y差使用上述向量的转置卷积。

    • 使用 sqrt(diffX ^ 2 + diffY ^ 2)计算每个像素的梯度幅度

    • 使用 atan(diffY / diffX)计算每个像素的渐变方向。请注意, atan 会在 -90 90 之间返回值,而您可能需要 0 180 。因此,只需翻转所有负值,即可添加 +180 度。注意,在人类检测的HOG中,它们使用无符号方向(在0和180之间)。如果您要使用签名的路线,则应该做一些努力:如果diffX和diffY为正,您的 atan 值将介于 0 90 - 保持原样。如果diffX和diffY是负的,你会得到相同的可能值的范围 - 这里,加+180,所以方向被翻转到另一边。如果diffX为正且diffY为负,您将在 -90 0 之间获取值 - 使它们保持不变(如果希望为正,则可以添加+360) 。如果diffY是正的,diffX是负的,你会再次得到相同的范围,所以加上+180,将方向翻转到另一边。

    • Bin的方向。例如,9个无符号箱: 0-20,20-40,...,160-180 。你可以很容易地实现这一点,每个值除以20,并将结果。您的新分组路线将位于 0 8 之间。


  3. 对每个块分别进行,使用原始矩阵的副本(因为一些块是重叠的,我们不想破坏其数据):




    • 拆分为单元格

    • 对于每个单元格,创建一个具有9个成员(每个bin一个)的向量。对于bin中的每个索引,设置具有该方向的所有像素的所有幅度的总和。我们在单元格中总共有6x6像素。因此,例如,如果2个像素具有方向0,而第一个的幅度是0.231并且第二个的幅度是0.13,则应该在矢量中的索引0中写入值0.361(= 0.231 + 0.13) li>
    • 将块中所有单元格的所有向量连接成一个大向量。此向量大小当然应为 NUMBER_OF_BINS * NUMBER_OF_CELLS_IN_BLOCK 。在我们的示例中,它是 9 *(3 * 3)= 81

    • 使用 k = sqrt(v [0] ^ 2 + v [1] ^ 2 + ... + v [n] ^ 2 + eps ^ 2)计算 k 之后,将向量中的每个值除以 k ,即可将您的向量归一化。


  4. 创建最终向量:




    • 将所有块的所有向量连接为1个大向量。在我的示例中,此向量的大小为6318



According to the HOG process, as described in the paper Histogram of Oriented Gradients for Human Detection (see link below), the contrast normalization step is done after the binning and the weighted vote.

I don't understand something - If I already computed the cells' weighted gradients, how can the normalization of the image's contrast help me now?

As far as I understand, contrast normalization is done on the original image, whereas for computing the gradients, I already computed the X,Y derivatives of the ORIGINAL image. So, if I normalize the contrast and I want it to take effect, I should compute everything again.

Is there something I don't understand well?

Should I normalize the cells' values?

Is the normalization in HOG not about contrast anyway, but is about the histogram values (counts of cells in each bin)?

Link to the paper: http://lear.inrialpes.fr/people/triggs/pubs/Dalal-cvpr05.pdf

解决方案

The contrast normalization is achieved by normalization of each block's local histogram.

The whole HOG extraction process is well explained here: http://www.geocities.ws/talh_davidc/#cst_extract

When you normalize the block histogram, you actually normalize the contrast in this block, if your histogram really contains the sum of magnitudes for each direction.

The term "histogram" is confusing here, because you do not count how many pixels has direction k, but instead you sum the magnitudes of such pixels. Thus you can normalize the contrast after computing the block's vector, or even after you computed the whole vector, assuming that you know in which indices in the vector a block starts and a block ends.

The steps of the algorithm due to my understanding - worked for me with 95% success rate:

  1. Define the following parameters (In this example, the parameters are like HOG for Human Detection paper):

    • A cell size in pixels (e.g. 6x6)
    • A block size in cells (e.g. 3x3 ==> Means that in pixels it is 18x18)
    • Block overlapping rate (e.g. 50% ==> Means that both block width and block height in pixels have to be even. It is satisfied in this example, because the cell width and cell height are even (6 pixels), making the block width and height also even)
    • Detection window size. The size must be dividable by a half of the block size without remainder (so it is possible to exactly place the blocks within with 50% overlapping). For example, the block width is 18 pixels, so the windows width must be a multiplication of 9 (e.g. 9, 18, 27, 36, ...). Same for the window height. In our example, the window width is 63 pixels, and the window height is 126 pixels.
  2. Calculate gradient:

    • Compute the X difference using convolution with the vector [-1 0 1]
    • Compute the Y difference using convolution with the transpose of the above vector
    • Compute the gradient magnitude in each pixel using sqrt(diffX^2 + diffY^2)
    • Compute the gradient direction in each pixel using atan(diffY / diffX). Note that atan will return values between -90 and 90, while you will probably want the values between 0 and 180. So just flip all the negative values by adding to them +180 degrees. Note that in HOG for Human Detection, they use unsigned directions (between 0 and 180). If you want to use signed directions, you should make a little more effort: If diffX and diffY are positive, your atan value will be between 0 and 90 - leave it as is. If diffX and diffY are negative, again, you'll get the same range of possible values - here, add +180, so the direction is flipped to the other side. If diffX is positive and diffY is negative, you'll get values between -90 and 0 - leave them the same (You can add +360 if you want it positive). If diffY is positive and diffX is negative, you'll again get the same range, so add +180, to flip the direction to the other side.
    • "Bin" the directions. For example, 9 unsigned bins: 0-20, 20-40, ..., 160-180. You can easily achieve that by dividing each value by 20 and flooring the result. Your new binned directions will be between 0 and 8.
  3. Do for each block separately, using copies of the original matrix (because some blocks are overlapping and we do not want to destroy their data):

    • Split to cells
    • For each cell, create a vector with 9 members (one for each bin). For each index in the bin, set the sum of all the magnitudes of all the pixels with that direction. We have totally 6x6 pixels in a cell. So for example, if 2 pixels have direction 0 while the magnitude of the first one is 0.231 and the magnitude of the second one is 0.13, you should write in index 0 in your vector the value 0.361 (= 0.231 + 0.13).
    • Concatenate all the vectors of all the cells in the block into a large vector. This vector size should of course be NUMBER_OF_BINS * NUMBER_OF_CELLS_IN_BLOCK. In our example, it is 9 * (3 * 3) = 81.
    • Now, normalize this vector. Use k = sqrt(v[0]^2 + v[1]^2 + ... + v[n]^2 + eps^2) (I used eps = 1). After you computed k, divide each value in the vector by k - thus your vector will be normalized.
  4. Create final vector:

    • Concatenate all the vectors of all the blocks into 1 large vector. In my example, the size of this vector was 6318

这篇关于HOG:在对比度归一化步骤中做了什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆