直方图箱的 HOG 三线性插值 [英] HOG Trilinear Interpolation of Histogram Bins

查看:15
本文介绍了直方图箱的 HOG 三线性插值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究定向梯度直方图 (HOG) 特征,并且我正在尝试实现直方图箱的三线性插值,如 Dalal 博士论文中所述.他解释了如下引用的插值过程:

粗略地说,HOG 特征是从一个 64x128 像素的窗口中提取的,该窗口分为多个块.每个块由 2x2 个单元组成,一个单元是 8x8 像素区域.提取从计算图像的一阶导数开始,然后计算每个像素的方向和大小.计算每个 8x8 像素单元的块内的方向直方图,其中基于像素的方向,像素对具有幅度值的直方图有贡献,并且幅度在方向和位置的相邻 bin 中心之间内插.直方图包含 9 个 bin,代表 0-180 度,步幅为 20 度.可以在此处查看该算法的总体描述:.

I am working on Histogram of Oriented Gradient(HOG) features and I am trying to implement the trilinear interpolation of histogram bins as described in Dalal's PhD thesis. And he explains the interpolation process as cited below:

EDIT: Roughly speaking, HOG features are extracted from a 64x128 pixel window which is divided into blocks. Each block consists of 2x2 cells and a cell is 8x8 pixel area. Extraction starts with calculating first order derivatives of image, then orientation and magnitude of each pixel are calculated. An orientation histogram within the block for each 8x8 pixel cell is calculated where pixels contribute to the histogram with the magnitude value, based on the orientation of the pixel, and magnitude is interpolated between the neighbouring bin centres in both orientation and position. Histogram contains 9 bins represents 0-180 degrees with stride of 20 degrees. An overall depiction of the algorithm can be seen here: http://4.bp.blogspot.com/_7NBDeKCsVHg/TKBbldI8GmI/AAAAAAAAAG0/G-OXUz1ouPQ/s1600/a1.bmp

We first describe linear interpolation in a one dimension space and then extend it to 3-D. Let h be a histogram with inter-bin distance(bandwidth) b. h(x) denotes the value of the histogram for the bin centred at x. Assume that we want to interpolate a weight w at point x into the histogram. Let x1 and x2 be the two nearest neighbouring bins of the point x such that x1 ≤ x < x2. Linear interpolation distributes the weight w into two nearest neighbours as follows

Let w at the 3-D point x = [x, y, z] be the weight to be interpolated. Let x1 and x2 be the two corner vectors of the histogram cube containing x, where in each component x1 ≤ x < x2. Assume that the bandwidth of the histogram along the x, y and z axis is given by b = [bx, by, bz]. Trilinear interpolation distributes the weight w to the 8 surrounding bin centres as follows:

.

We compute histogram for cells and every pixel contributes with its magnitude value to the histogram. What I understand from the formulation is that x and y represents the location of the cells in the detection window and z is the bin number. In a 64x128 detection window, there are 8x16 cells and 9 orientation bins so that our histogram is represented as h(8,16,9). If above statements are correct, do (x1,y1) and (x2,y2) represent previous and letter cells respectively? Does z1 and z2 mean the previous and letter orientation bins? What about bandwidth b=[bx, by, bz]?

I'd be really appreciated if someone can clarify these issues.

Thanks.

解决方案

Think of (x1, y1, z1) and (x2, y2, z2) as two points spanning a cube that surrounds the point (x,y,z) for which you want to interpolate a value of h. The set of eight points (x1, y1, z1), (x2, y1, z1), (x1, y2, z1), (x1, y1, z2), (x2, y2, z1), (x2, y1, z2), (x1, y2, z2), (x2, y2, z2) forms the complete cube. So trilinear interpolation between (x1, y1, z1) and (x2, y2, z2) actually means interpolation between the 8 points in the 3D histogram space surrounding the point you are interested in! Now to your questions:

(x1, y1), (x2, y2) (and (x1,y2) and (x2, y1) represent the centers of bins in the (x,y) plane. In your case these would be the orientation vectors.

z1 and z2 represent two bin levels in the orientation direction, as you say. Combined with the four points in the image plane this gives you a total of 8 bins.

The bandwidth b=[bx, by, bz] is basically the distance between the centers of neighbouring bins in the x, y and z direction. In your case, with 8 bins in the x-direction and 64 pixels in that direction, 16 bins in the y direction and 128 pixels in the y direction:

bx = 8 pixels
by = 8 pixels

This leaves bz, for which I actually need more data, because I don't know the full range of your gradient (i.e. lowest to highest possible value) but if that range is rg then:

bz = rg/9

In general, the bandwidth in any direction equals the full available range in that direction divided by the number of bins in that direction.

For a good explanation of trilinear interpolation with pictures look at the link in whoplisp's answer.

这篇关于直方图箱的 HOG 三线性插值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆