没有得到什么'空间权重的HOG是 [英] Not getting what 'spatial weights' for HOG are

查看:244
本文介绍了没有得到什么'空间权重的HOG是的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用HOG进行向日葵检测。我理解了HOG现在做的大多数事情,但有些事情,我不明白在最后阶段。 (我将通过Mathworks的MATLAB代码)。



让我们假设我们使用Dalal-Triggs实现。 (也就是说,8×8像素形成1个单元,2×2个单元形成1个块,在两个方向上以50%重叠获取块,最后,我们将直方图量化为9个二进制,无符号(意指从0到180度))。最后,我们的图片是64x128像素。



让我们说我们在第一个区块。这个块有4个单元格。我明白,我们将重视每个方向的方向的大小。我也明白,我们将进一步加重他们,一个高斯集中在块。



到目前为止很好。



但是在MATLAB实现中,他们有一个额外的步骤,他们创建一个'空间'权重:





如果我们潜入这个函数,它看起来像这样:





最后,函数'computeLowerHistBin'看起来像这样:

  function [x1,b1] = computeLowerHistBin(x,binWidth)
%Bin index
width = single(binWidth);
invWidth = 1./width;
bin = floor(x。* invWidth - 0.5);

%bin center x1
x1 = width *(bin + 0.5);

%add 2到基于1的索引
b1 = int32(bin + 2);
end



现在,我相信那些'空间'权重在三线性内插部分以后...但我不能得到的只是它们是如何确切地计算,或代码后面的逻辑。我完全失去在这个问题。



注意:我理解需要三线性插值,并且(我想)它是如何工作的。我不明白的是为什么我们需要那些空间权重,以及他们在这里计算的逻辑是什么。



感谢。

解决方案

三线插值的权重。请看这里的三线性插值方程:



HOG直方图内存的三线性插值



在这里你会看到类似(x-x1)/ bx, -y1)/ by,(1-(x-x1)/ bx)等。在代码中,wx1和wy1对应于:

  wx1 =(1-(x-x1)/ bx)
wy1 =(1 - (y-y1)/ by)

这里,x1和y1是X和Y方向的直方图面元的中心。在1D中更容易描述这些东西。因此在1D中,值x将落在2个bin中心x1 <= x < x2。它无关紧要的bin(1或2)它属于。重要的是找出属于x1的x的分数,其余属于x2。使用从x到x​​1的距离和除以bin的宽度给出百分比距离。 1减去那是属于bin 1的分数。因此,如果x == x1,wx1为1.如果x == x2,wx1为0,因为x2 - x1 == bx(bin的宽度)。<返回到创建4个矩阵的代码只是预先计算HOG块中的所有像素的内插所需的权重的所有乘法。这就是为什么它是一个权重的矩阵:矩阵中的每个元素,如果对于HOG块中的一个像素。



例如, h(x1,y2,〜)的wieghts将会看到x和y的这两个权重(忽略z分量)。

 (1  - (x-x1)/ bx)*((y-y1)/ by)

回到代码,这个乘法是预先计算块中的每个像素使用:

  weights.x1y2 =(1-wy1)'* wx1; 

其中

 (1-wy1)==(y  -  y1)/ by 



对于computeLowerHistBin中的代码,它只是在三线性插值方程中找到x1,其中x1 <= x < x2(y1相同)。假设像素位置x和bin bx的宽度,只要满足x1 <= x <1,则可能有一堆方法来解决这个问题。 x2。



例如,|指示块边缘。 o是分类中心。

  -20 0 20 40 
| ------ o- ------ | ------- o ------- | ------- o ------- |
-10 10 30

如果x = [2 9 11] x1是[-10 -10 10]。


I am using HOG for sunflower detection. I understand most of what HOG is doing now, but have some things that I do not understand in the final stages. (I am going through the MATLAB code from Mathworks).

Let us assume we are using the Dalal-Triggs implementation. (That is, 8x8 pixels make 1 cell, 2x2 cells make 1 block, blocks are taken at 50% overlap in both directions, and lastly, that we have quantized the histograms into 9 bins, unsigned. (meaning, from 0 to 180 degrees)). Finally, our image here is 64x128 pixels.

Let us say that we are on the first block. This block has 4 cells. I understand that we are going to weight the orientations of each of the orientations by their magnitude. I also understand that we are going to weight them further, by a gaussian centered on the block.

So far so good.

However in the MATLAB implementation, they have an additional step, whereby they create a 'spatial' weight:

If we dive into this function, it looks like this:

Finally, the function 'computeLowerHistBin' looks like this:

function [x1, b1] = computeLowerHistBin(x, binWidth)
% Bin index
width    = single(binWidth);
invWidth = 1./width;
bin      = floor(x.*invWidth - 0.5);

% Bin center x1
x1 = width * (bin + 0.5);

% add 2 to get to 1-based indexing
b1 = int32(bin + 2);
end

Now, I believe that those 'spatial' weights are being used during the tri-linear interpolation part later on... but what I do not get is just how exactly they are being computed, or the logic behind that code. I am completely lost on this issue.

Note: I understand the need for the tri-linear interpolation, and (I think) how it works. What I do not understand is why we need those 'spatial weights', and what the logic behind their computation here is.

Thanks.

解决方案

This code is pre-computing the spatial weights for the trilinear interpolation. Take a look at the equation here for trilinear interpolation:

HOG Trilinear Interpolation of Histogram Bins

There you see things like (x-x1)/bx, (y-y1)/by, (1 - (x-x1)/bx), etc. In the code, wx1 and wy1 correspond to:

wx1 = (1 - (x-x1)/bx)
wy1 = (1 - (y-y1)/by)

Here, x1 and y1 are centers of the histogram bins for the X and Y directions. It's easier to describe these things in 1D. So in 1D, a value x will fall between 2 bin centers x1 <= x < x2. It doesn't matter exactly bin (1 or 2) it belongs. The important thing is to figure out the fraction of x that belongs to x1, the rest belongs to x2. Using the distance from x to x1 and dividing by the width of the bin gives a percentage distance. 1 minus that is the fraction that belongs to bin 1. So if x == x1, wx1 is 1. And if x == x2, wx1 is zero because x2 - x1 == bx (the width of a bin).

Going back to the code that creates the 4 matrices is just pre-computing all the multiplications of the weights needed for the interpolation of all the pixels in a HOG block. That is why it is a matrix of weights: each element in the matrix if for one of the pixels in the HOG block.

For example, you look at the equation for the wieghts for h(x1, y2, ~) you'll see these 2 weights for x and y (ignoring the z component).

(1 - (x-x1)/bx) * ((y-y1)/by)

Going back to the code, this multiplication is pre-computed for every pixel in the block using:

weights.x1y2 = (1-wy1)' * wx1;

where

(1-wy1) == (y - y1)/by

The same logic applies to the other weight matrices.

As for the code in "computeLowerHistBin", it's just finding the x1 in the trilinear interpolation equation, where x1 <= x < x2 (same for y1). There are probably a bunch of ways to solve this problem given a pixel location x and the width of a bin bx as long as you satisfy x1 <= x < x2.

For example, "|" indicate bin edges. "o" are the bin centers.

-20             0              20               40
 |------o-------|-------o-------|-------o-------|
       -10              10              30

if x = [2 9 11], the lower bin center x1 is [-10 -10 10].

这篇关于没有得到什么'空间权重的HOG是的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆