Viola-Jones的脸部检测声称拥有180k的功能 [英] Viola-Jones' face detection claims 180k features

查看:167
本文介绍了Viola-Jones的脸部检测声称拥有180k的功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在实施 Viola-Jones的面部检测算法的改编。该技术依赖于在图像内放置24×24像素的子帧,并且随后在每个可能尺寸的每个位置放置矩形特征。

I've been implementing an adaptation of Viola-Jones' face detection algorithm. The technique relies upon placing a subframe of 24x24 pixels within an image, and subsequently placing rectangular features inside it in every position with every size possible.

这些特征可以包括两个,三个或四个矩形。

These features can consist of two, three or four rectangles. The following example is presented.

他们声称详尽的集合超过180k(第2节):

They claim the exhaustive set is more than 180k (section 2):


考虑到检测器的基本分辨率是24x24,矩形特征的穷尽集是相当大的,超过18万。注意,与Haar基础不同,矩形
特征的集合是过度完成的。

Given that the base resolution of the detector is 24x24, the exhaustive set of rectangle features is quite large, over 180,000 . Note that unlike the Haar basis, the set of rectangle features is overcomplete.

下面的语句没有明确说明因此他们是我的一部分假设:

The following statements are not explicitly stated in the paper, so they are assumptions on my part:


  1. 只有2个双矩形特征,2个三矩形特征和1四 - 正交角特征。这背后的逻辑是我们观察突出显示的矩形之间的差异,而不是明确的颜色或亮度或任何类别。

  2. 我们无法定义特征类型A作为1x1像素块;它必须至少是至少1x2像素。此外,类型D必须至少为2x2像素,并且此规则相应地适用于其他特征。

  3. 我们不能将特征类型A定义为1x3像素块,因为中间像素无法分区,并且从其自身中减去它与1x2像素块相同;此要素类型仅针对均匀宽度进行定义。此外,要素类型C的宽度必须可以除以3,这条规则相应地适用于其他要素。

  4. 我们不能定义宽度和/或高度为0的要素。因此,我们将 x y 迭代到减去要素大小的24。

  1. There are only 2 two-rectangle features, 2 three-rectangle features and 1 four-rectangle feature. The logic behind this is that we are observing the difference between the highlighted rectangles, not explicitly the color or luminance or anything of that sort.
  2. We cannot define feature type A as a 1x1 pixel block; it must at least be at least 1x2 pixels. Also, type D must be at least 2x2 pixels, and this rule holds accordingly to the other features.
  3. We cannot define feature type A as a 1x3 pixel block as the middle pixel cannot be partitioned, and subtracting it from itself is identical to a 1x2 pixel block; this feature type is only defined for even widths. Also, the width of feature type C must be divisible by 3, and this rule holds accordingly to the other features.
  4. We cannot define a feature with a width and/or height of 0. Therefore, we iterate x and y to 24 minus the size of the feature.

根据这些假设,我计算了穷举集:

Based upon these assumptions, I've counted the exhaustive set:

const int frameSize = 24;
const int features = 5;
// All five feature types:
const int feature[features][2] = {{2,1}, {1,2}, {3,1}, {1,3}, {2,2}};

int count = 0;
// Each feature:
for (int i = 0; i < features; i++) {
    int sizeX = feature[i][0];
    int sizeY = feature[i][1];
    // Each position:
    for (int x = 0; x <= frameSize-sizeX; x++) {
        for (int y = 0; y <= frameSize-sizeY; y++) {
            // Each size fitting within the frameSize:
            for (int width = sizeX; width <= frameSize-x; width+=sizeX) {
                for (int height = sizeY; height <= frameSize-y; height+=sizeY) {
                    count++;
                }
            }
        }
    }
}


$ b b

结果是 162,336

我发现近似超过180,000琼斯说,放弃假设#4和在代码中引入bug。这涉及将四行分别改为:

The only way I found to approximate the "over 180,000" Viola & Jones speak of, is dropping assumption #4 and by introducing bugs in the code. This involves changing four lines respectively to:

for (int width = 0; width < frameSize-x; width+=sizeX)
for (int height = 0; height < frameSize-y; height+=sizeY)

结果是 180,625 。 (请注意,这将有效地防止功能从未触及子框架的右侧和/或底部。)

The result is then 180,625. (Note that this will effectively prevent the features from ever touching the right and/or bottom of the subframe.)

现在当然有问题:他们犯了一个错误它们的实现?考虑表面为零的特征是否有意义?

Now of course the question: have they made a mistake in their implementation? Does it make any sense to consider features with a surface of zero? Or am I seeing it the wrong way?

推荐答案

仔细看看,你的代码看起来对我来说是正确的;这使得人们怀疑原始作者是否有一个一个错误。我想有人应该看看OpenCV如何实现它!

Upon closer look, your code looks correct to me; which makes one wonder whether the original authors had an off-by-one bug. I guess someone ought to look at how OpenCV implements it!

然而,一个建议,使其更容易理解是翻转的顺序>

Nonetheless, one suggestion to make it easier to understand is to flip the order of the for loops by going over all sizes first, then looping over the possible locations given the size:

#include <stdio.h>
int main()
{
    int i, x, y, sizeX, sizeY, width, height, count, c;

    /* All five shape types */
    const int features = 5;
    const int feature[][2] = {{2,1}, {1,2}, {3,1}, {1,3}, {2,2}};
    const int frameSize = 24;

    count = 0;
    /* Each shape */
    for (i = 0; i < features; i++) {
        sizeX = feature[i][0];
        sizeY = feature[i][1];
        printf("%dx%d shapes:\n", sizeX, sizeY);

        /* each size (multiples of basic shapes) */
        for (width = sizeX; width <= frameSize; width+=sizeX) {
            for (height = sizeY; height <= frameSize; height+=sizeY) {
                printf("\tsize: %dx%d => ", width, height);
                c=count;

                /* each possible position given size */
                for (x = 0; x <= frameSize-width; x++) {
                    for (y = 0; y <= frameSize-height; y++) {
                        count++;
                    }
                }
                printf("count: %d\n", count-c);
            }
        }
    }
    printf("%d\n", count);

    return 0;
}

,结果与上一个 162336

with the same results as the previous 162336

要验证它,我测试了4x4窗口的情况,并手动检查所有情况计数,因为1x2 / 2x1和1x3 / 3x1形状是相同的只有90度旋转):

To verify it, I tested the case of a 4x4 window and manually checked all cases (easy to count since 1x2/2x1 and 1x3/3x1 shapes are the same only 90 degrees rotated):

2x1 shapes:
        size: 2x1 => count: 12
        size: 2x2 => count: 9
        size: 2x3 => count: 6
        size: 2x4 => count: 3
        size: 4x1 => count: 4
        size: 4x2 => count: 3
        size: 4x3 => count: 2
        size: 4x4 => count: 1
1x2 shapes:
        size: 1x2 => count: 12             +-----------------------+
        size: 1x4 => count: 4              |     |     |     |     |
        size: 2x2 => count: 9              |     |     |     |     |
        size: 2x4 => count: 3              +-----+-----+-----+-----+
        size: 3x2 => count: 6              |     |     |     |     |
        size: 3x4 => count: 2              |     |     |     |     |
        size: 4x2 => count: 3              +-----+-----+-----+-----+
        size: 4x4 => count: 1              |     |     |     |     |
3x1 shapes:                                |     |     |     |     |
        size: 3x1 => count: 8              +-----+-----+-----+-----+
        size: 3x2 => count: 6              |     |     |     |     |
        size: 3x3 => count: 4              |     |     |     |     |
        size: 3x4 => count: 2              +-----------------------+
1x3 shapes:
        size: 1x3 => count: 8                  Total Count = 136
        size: 2x3 => count: 6
        size: 3x3 => count: 4
        size: 4x3 => count: 2
2x2 shapes:
        size: 2x2 => count: 9
        size: 2x4 => count: 3
        size: 4x2 => count: 3
        size: 4x4 => count: 1

这篇关于Viola-Jones的脸部检测声称拥有180k的功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆