使用本地描述符实现人脸识别(无监督学习) [英] Implementing Face Recognition using Local Descriptors (Unsupervised Learning)

查看:514
本文介绍了使用本地描述符实现人脸识别(无监督学习)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Python实现面部识别算法。我希望能够接收图像目录,并计算它们之间的成对距离,当短距离应该有希望对应于属于同一个人的图像。最终目标是聚类图像并执行一些基本的面部识别任务(无监督学习)。



由于无监督设置,我解决问题的方法是计算面部签名(某个int d的R ^ d中的向量)然后计算出一个度量,其中属于同一个人的两个面确实会有一个短距离。



我有一个人脸检测算法,可以检测人脸,裁剪图像并执行一些基本的预处理,因此我输入算法的图像是灰色的并且是均衡的(见下文)。



对于面部签名部分,我尝试了两种方法,我在几个出版物中读过:


  1. LBP (局部二进制模式)

  2. 在7个面部标志点(口右侧,计算 SIFT 描述符左边的嘴等等,我使用外部应用程序识别每个图像。签名是描述符的平方根的串联(这导致更高的维度,但是现在性能不是问题)。



为了比较两个签名,我使用的是OpenCV的compareHist函数(参见



我读过这个很棒的演示文稿,它很受欢迎测试集上的度量学习过程,应该可以显着改善结果。然而,它在演示文稿和其他地方确实说常规距离测量也应该得到OK结果,所以在我尝试之前我想知道为什么我正在做的事情没有给我任何东西。



总之,我的问题,我很乐意得到任何帮助:


  1. 我改进的一个改进是仅在实际面上执行LBP,而不是角落以及可能在签名上插入噪声的所有内容。在计算LBP之前,如何屏蔽掉不是面部的部分?我也在使用OpenCV这个部分。


  2. 我对计算机视觉很新;我将如何调试我的算法以找出出错的地方?这可能吗?


  3. 在无人监督的设置中,是否有其他方法(不是局部描述符+计算距离)可以用于聚类面部的任务?


  4. OpenCV模块中还有其他什么我可能没有想过可能有用吗?似乎所有的算法都需要训练,在我的情况下没用 - 算法需要处理全新的图像。


提前致谢。

解决方案

您正在寻找的是无监督的特征提取 - 取一堆未标记的图像并找到描述这些图像的最重要特征。



无监督特征提取的最先进方法都基于(卷积)神经网络。看看自动编码器( http://ufldl.stanford.edu/wiki/index.php/ Autoencoders_and_Sparsity )或受限制的Bolzmann机器(RBM)。



您还可以使用现有的面部检测器,例如DeepFace( https://www.cs.toronto.edu/~ranzato/publications/taigman_cvpr14.pdf ),take只使用要素图层并使用它们之间的距离将相似的面组合在一起。



我担心OpenCV不适合这项任务,你可能想检查Caffe, Theano,TensorFlow或Keras。


I'm trying to implement a face recognition algorithm using Python. I want to be able to receive a directory of images, and compute pair-wise distances between them, when short distances should hopefully correspond to the images belonging to the same person. The ultimate goal is to cluster images and perform some basic face identification tasks (unsupervised learning).

Because of the unsupervised setting, my approach to the problem is to calculate a "face signature" (a vector in R^d for some int d) and then figure out a metric in which two faces belonging to the same person will indeed have a short distance between them.

I have a face detection algorithm which detects the face, crops the image and performs some basic pre-processing, so the images i'm feeding to the algorithm are gray and equalized (see below).

For the "face signature" part, I've tried two approaches which I read about in several publications:

  1. Taking the histogram of the LBP (Local Binary Pattern) of the entire (processed) image
  2. Calculating SIFT descriptors at 7 facial landmark points (right of mouth, left of mouth, etc.), which I identify per image using an external application. The signature is the concatenation of the square root of the descriptors (this results in a much higher dimension, but for now performance is not a problem).

For the comparison of two signatures, I'm using OpenCV's compareHist function (see here), trying out several different distance metrics (Chi Square, Euclidean, etc).

I know that face recognition is a hard task, let alone without any training, so I'm not expecting great results. But all I'm getting so far seems completely random. For example, when calculating distances from the image on the far right against the rest of the image, I'm getting she is most similar to 4 Bill Clintons (...!).

I have read in this great presentation that it's popular to carry out a "metric learning" procedure on a test set, which should significantly improve results. However it does say in the presentation and elsewhere that "regular" distance measures should also get OK results, so before I try this out I want to understand why what I'm doing gets me nothing.

In conclusion, my questions, which I'd love to get any sort of help on:

  1. One improvement I though of would be to perform LBP only on the actual face, and not the corners and everything that might insert noise to the signature. How can I mask out the parts which are not the face before calculating LBP? I'm using OpenCV for this part too.

  2. I'm fairly new to computer vision; How would I go about "debugging" my algorithm to figure out where things go wrong? Is this possible?

  3. In the unsupervised setting, is there any other approach (which is not local descriptors + computing distances) that could work, for the task of clustering faces?

  4. Is there anything else in the OpenCV module that maybe I haven't thought of that might be helpful? It seems like all the algorithms there require training and are not useful in my case - the algorithm needs to work on images which are completely new.

Thanks in advance.

解决方案

What you are looking for is unsupervised feature extraction - take a bunch of unlabeled images and find the most important features describing these images.

The state-of-the-art methods for unsupervised feature extraction are all based on (convolutional) neural networks. Have look at autoencoders (http://ufldl.stanford.edu/wiki/index.php/Autoencoders_and_Sparsity) or Restricted Bolzmann Machines (RBMs).

You could also take an existing face detector such as DeepFace (https://www.cs.toronto.edu/~ranzato/publications/taigman_cvpr14.pdf), take only feature layers and use distance between these to group similar faces together.

I'm afraid that OpenCV is not well suited for this task, you might want to check Caffe, Theano, TensorFlow or Keras.

这篇关于使用本地描述符实现人脸识别(无监督学习)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆