Viola-Jones 的人脸检测声称拥有 18 万个功能 [英] Viola-Jones' face detection claims 180k features

查看:21
本文介绍了Viola-Jones 的人脸检测声称拥有 18 万个功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在实施 Viola-Jones 的人脸检测算法的改编版.该技术依赖于在图像中放置一个 24x24 像素的子帧,然后将矩形特征放置在每个可能大小的位置.

I've been implementing an adaptation of Viola-Jones' face detection algorithm. The technique relies upon placing a subframe of 24x24 pixels within an image, and subsequently placing rectangular features inside it in every position with every size possible.

这些特征可以由两个、三个或四个矩形组成.提供了以下示例.

These features can consist of two, three or four rectangles. The following example is presented.

他们声称详尽的集合超过 180k(第 2 部分):

They claim the exhaustive set is more than 180k (section 2):

鉴于检测器的基本分辨率为 24x24,详尽的矩形特征集非常大,超过 180,000 .注意,与 Haar 基不同的是,矩形的集合功能过于完备.

Given that the base resolution of the detector is 24x24, the exhaustive set of rectangle features is quite large, over 180,000 . Note that unlike the Haar basis, the set of rectangle features is overcomplete.

以下陈述未在论文中明确说明,因此它们是我的假设:

The following statements are not explicitly stated in the paper, so they are assumptions on my part:

  1. 只有 2 个二矩形特征、2 个三矩形特征和 1 个四矩形特征.这背后的逻辑是我们观察的是突出显示的矩形之间的差异,而不是明确的颜色或亮度或任何类似的东西.
  2. 我们不能将特征类型 A 定义为 1x1 像素块;它必须至少为 1x2 像素.此外,类型 D 必须至少为 2x2 像素,此规则也适用于其他特征.
  3. 我们不能将特征类型 A 定义为 1x3 像素块,因为中间像素无法分割,从自身减去它与 1x2 像素块相同;此特征类型仅针对偶数宽度定义.此外,要素类型 C 的宽度必须能被 3 整除,此规则也适用于其他要素.
  4. 我们不能定义宽度和/或高度为 0 的特征.因此,我们将 xy 迭代到 24 减去特征的大小.
  1. There are only 2 two-rectangle features, 2 three-rectangle features and 1 four-rectangle feature. The logic behind this is that we are observing the difference between the highlighted rectangles, not explicitly the color or luminance or anything of that sort.
  2. We cannot define feature type A as a 1x1 pixel block; it must at least be at least 1x2 pixels. Also, type D must be at least 2x2 pixels, and this rule holds accordingly to the other features.
  3. We cannot define feature type A as a 1x3 pixel block as the middle pixel cannot be partitioned, and subtracting it from itself is identical to a 1x2 pixel block; this feature type is only defined for even widths. Also, the width of feature type C must be divisible by 3, and this rule holds accordingly to the other features.
  4. We cannot define a feature with a width and/or height of 0. Therefore, we iterate x and y to 24 minus the size of the feature.

基于这些假设,我计算了详尽的集合:

Based upon these assumptions, I've counted the exhaustive set:

const int frameSize = 24;
const int features = 5;
// All five feature types:
const int feature[features][2] = {{2,1}, {1,2}, {3,1}, {1,3}, {2,2}};

int count = 0;
// Each feature:
for (int i = 0; i < features; i++) {
    int sizeX = feature[i][0];
    int sizeY = feature[i][1];
    // Each position:
    for (int x = 0; x <= frameSize-sizeX; x++) {
        for (int y = 0; y <= frameSize-sizeY; y++) {
            // Each size fitting within the frameSize:
            for (int width = sizeX; width <= frameSize-x; width+=sizeX) {
                for (int height = sizeY; height <= frameSize-y; height+=sizeY) {
                    count++;
                }
            }
        }
    }
}

结果是162,336.

我发现近似超过 180,000"中提琴和的唯一方法琼斯谈到,正在放弃假设 #4 并在代码中引入错误.这涉及将四行分别更改为:

The only way I found to approximate the "over 180,000" Viola & Jones speak of, is dropping assumption #4 and by introducing bugs in the code. This involves changing four lines respectively to:

for (int width = 0; width < frameSize-x; width+=sizeX)
for (int height = 0; height < frameSize-y; height+=sizeY)

结果是180,625.(请注意,这将有效地防止功能接触子框架的右侧和/或底部.)

The result is then 180,625. (Note that this will effectively prevent the features from ever touching the right and/or bottom of the subframe.)

当然是问题:他们在实施过程中犯了错误吗?考虑表面为零的特征是否有意义?还是我看错了?

Now of course the question: have they made a mistake in their implementation? Does it make any sense to consider features with a surface of zero? Or am I seeing it the wrong way?

推荐答案

仔细一看,你的代码在我看来是正确的;这不禁让人怀疑原作者是否有一个逐一的错误.我想应该有人看看 OpenCV 是如何实现的!

Upon closer look, your code looks correct to me; which makes one wonder whether the original authors had an off-by-one bug. I guess someone ought to look at how OpenCV implements it!

尽管如此,一个更容易理解的建议是翻转 for 循环的顺序,首先遍历所有大小,然后遍历给定大小的可能位置:

Nonetheless, one suggestion to make it easier to understand is to flip the order of the for loops by going over all sizes first, then looping over the possible locations given the size:

#include <stdio.h>
int main()
{
    int i, x, y, sizeX, sizeY, width, height, count, c;

    /* All five shape types */
    const int features = 5;
    const int feature[][2] = {{2,1}, {1,2}, {3,1}, {1,3}, {2,2}};
    const int frameSize = 24;

    count = 0;
    /* Each shape */
    for (i = 0; i < features; i++) {
        sizeX = feature[i][0];
        sizeY = feature[i][1];
        printf("%dx%d shapes:
", sizeX, sizeY);

        /* each size (multiples of basic shapes) */
        for (width = sizeX; width <= frameSize; width+=sizeX) {
            for (height = sizeY; height <= frameSize; height+=sizeY) {
                printf("	size: %dx%d => ", width, height);
                c=count;

                /* each possible position given size */
                for (x = 0; x <= frameSize-width; x++) {
                    for (y = 0; y <= frameSize-height; y++) {
                        count++;
                    }
                }
                printf("count: %d
", count-c);
            }
        }
    }
    printf("%d
", count);

    return 0;
}

结果与之前的162336

为了验证它,我测试了 4x4 窗口的情况并手动检查了所有情况(易于计数,因为 1x2/2x1 和 1x3/3x1 形状相同仅旋转 90 度):

To verify it, I tested the case of a 4x4 window and manually checked all cases (easy to count since 1x2/2x1 and 1x3/3x1 shapes are the same only 90 degrees rotated):

2x1 shapes:
        size: 2x1 => count: 12
        size: 2x2 => count: 9
        size: 2x3 => count: 6
        size: 2x4 => count: 3
        size: 4x1 => count: 4
        size: 4x2 => count: 3
        size: 4x3 => count: 2
        size: 4x4 => count: 1
1x2 shapes:
        size: 1x2 => count: 12             +-----------------------+
        size: 1x4 => count: 4              |     |     |     |     |
        size: 2x2 => count: 9              |     |     |     |     |
        size: 2x4 => count: 3              +-----+-----+-----+-----+
        size: 3x2 => count: 6              |     |     |     |     |
        size: 3x4 => count: 2              |     |     |     |     |
        size: 4x2 => count: 3              +-----+-----+-----+-----+
        size: 4x4 => count: 1              |     |     |     |     |
3x1 shapes:                                |     |     |     |     |
        size: 3x1 => count: 8              +-----+-----+-----+-----+
        size: 3x2 => count: 6              |     |     |     |     |
        size: 3x3 => count: 4              |     |     |     |     |
        size: 3x4 => count: 2              +-----------------------+
1x3 shapes:
        size: 1x3 => count: 8                  Total Count = 136
        size: 2x3 => count: 6
        size: 3x3 => count: 4
        size: 4x3 => count: 2
2x2 shapes:
        size: 2x2 => count: 9
        size: 2x4 => count: 3
        size: 4x2 => count: 3
        size: 4x4 => count: 1

这篇关于Viola-Jones 的人脸检测声称拥有 18 万个功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆