直方图箱的HOG三线性插值 [英] HOG Trilinear Interpolation of Histogram Bins

查看:420
本文介绍了直方图箱的HOG三线性插值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在工作的直方图直方图(HOG)功能,我正试图实现直方图框的三线性插值,如达拉尔博士论文中所述。他解释了如下所述的插值过程:



编辑:大致来说,HOG特征是从64x128像素窗口提取的,这个窗口分成块。每个块由2×2个单元组成,并且单元是8×8像素区域。提取开始于计算图像的一阶导数,然后计算每个像素的方向和大小。计算每个8×8像素单元的块内的取向直方图,其中像素基于像素的取向对具有幅度值的直方图作出贡献,并且幅度在取向和位置两者的相邻像素中心之间内插。直方图包含9个箱,代表0度至180度,步幅为20度。该算法的整体描述可以在这里看到:。


I am working on Histogram of Oriented Gradient(HOG) features and I am trying to implement the trilinear interpolation of histogram bins as described in Dalal's PhD thesis. And he explains the interpolation process as cited below:

EDIT: Roughly speaking, HOG features are extracted from a 64x128 pixel window which is divided into blocks. Each block consists of 2x2 cells and a cell is 8x8 pixel area. Extraction starts with calculating first order derivatives of image, then orientation and magnitude of each pixel are calculated. An orientation histogram within the block for each 8x8 pixel cell is calculated where pixels contribute to the histogram with the magnitude value, based on the orientation of the pixel, and magnitude is interpolated between the neighbouring bin centres in both orientation and position. Histogram contains 9 bins represents 0-180 degrees with stride of 20 degrees. An overall depiction of the algorithm can be seen here: http://4.bp.blogspot.com/_7NBDeKCsVHg/TKBbldI8GmI/AAAAAAAAAG0/G-OXUz1ouPQ/s1600/a1.bmp

We first describe linear interpolation in a one dimension space and then extend it to 3-D. Let h be a histogram with inter-bin distance(bandwidth) b. h(x) denotes the value of the histogram for the bin centred at x. Assume that we want to interpolate a weight w at point x into the histogram. Let x1 and x2 be the two nearest neighbouring bins of the point x such that x1 ≤ x < x2. Linear interpolation distributes the weight w into two nearest neighbours as follows

Let w at the 3-D point x = [x, y, z] be the weight to be interpolated. Let x1 and x2 be the two corner vectors of the histogram cube containing x, where in each component x1 ≤ x < x2. Assume that the bandwidth of the histogram along the x, y and z axis is given by b = [bx, by, bz]. Trilinear interpolation distributes the weight w to the 8 surrounding bin centres as follows:

.

We compute histogram for cells and every pixel contributes with its magnitude value to the histogram. What I understand from the formulation is that x and y represents the location of the cells in the detection window and z is the bin number. In a 64x128 detection window, there are 8x16 cells and 9 orientation bins so that our histogram is represented as h(8,16,9). If above statements are correct, do (x1,y1) and (x2,y2) represent previous and letter cells respectively? Does z1 and z2 mean the previous and letter orientation bins? What about bandwidth b=[bx, by, bz]?

I'd be really appreciated if someone can clarify these issues.

Thanks.

解决方案

Think of (x1, y1, z1) and (x2, y2, z2) as two points spanning a cube that surrounds the point (x,y,z) for which you want to interpolate a value of h. The set of eight points (x1, y1, z1), (x2, y1, z1), (x1, y2, z1), (x1, y1, z2), (x2, y2, z1), (x2, y1, z2), (x1, y2, z2), (x2, y2, z2) forms the complete cube. So trilinear interpolation between (x1, y1, z1) and (x2, y2, z2) actually means interpolation between the 8 points in the 3D histogram space surrounding the point you are interested in! Now to your questions:

(x1, y1), (x2, y2) (and (x1,y2) and (x2, y1) represent the centers of bins in the (x,y) plane. In your case these would be the orientation vectors.

z1 and z2 represent two bin levels in the orientation direction, as you say. Combined with the four points in the image plane this gives you a total of 8 bins.

The bandwidth b=[bx, by, bz] is basically the distance between the centers of neighbouring bins in the x, y and z direction. In your case, with 8 bins in the x-direction and 64 pixels in that direction, 16 bins in the y direction and 128 pixels in the y direction:

bx = 8 pixels
by = 8 pixels

This leaves bz, for which I actually need more data, because I don't know the full range of your gradient (i.e. lowest to highest possible value) but if that range is rg then:

bz = rg/9

In general, the bandwidth in any direction equals the full available range in that direction divided by the number of bins in that direction.

For a good explanation of trilinear interpolation with pictures look at the link in whoplisp's answer.

这篇关于直方图箱的HOG三线性插值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆