谷歌移动视觉:没有CameraSource的FaceDetector表现不佳 [英] Google Mobile Vision: Poor FaceDetector performance without CameraSource

查看:512
本文介绍了谷歌移动视觉:没有CameraSource的FaceDetector表现不佳的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

目前,我们的应用程序已成功运行Snapdragon SDK。我们正在尝试在我们的项目中实现Vision 8.3.0中的FaceDetector,以增加兼容设备的数量。我们不能使用CameraSource,因为我们依靠自定义相机+表面来提供某些功能。我们希望尽可能多地重用代码,Snapdragon SDK在我们当前的实现方面做得非常好。



工作流程如下:



1)检索相机预览



2)将传入的字节数组转换为位图(由于某种原因,我们无法使用ByteBuffers。提供并验证了尺寸,旋转和NV21图像格式,但未找到面部)。 Bitmap是一个已在处理线程内部初始化的全局变量,以避免分配速度减慢。



3)通过receiveFrame提供检测器



到目前为止的结果还不够好。即使我们已禁用地标和分类,检测速度太慢(2-3秒)且不准确。



问题是:是否可以复制CameraSource + Detector没有使用前者的表现?是否必须使用CameraSource才能使其与实时输入一起使用?



提前致谢!



编辑



在下面的pm0733464建议之后,我正在尝试使用ByteBuffer而不是Bitmap。这是我遵循的步骤:

  //初始化变量
// Mat是opencvSDK
的一部分Mat currentFrame = new Mat(cameraPreviewHeight + cameraPreviewHeight / 2,cameraPreviewWidth,CvType.CV_8UC1);
Mat yuvMat = new Mat(cameraPreviewHeight + cameraPreviewHeight / 2,cameraPreviewWidth,CvType.CV_8UC1);

//加载当前帧
yuvMat.put(0,0,data);

//将帧转换为灰色以便更好地处理
Imgproc.cvtColor(yuvMat,currentFrame,Imgproc.COLOR_YUV420sp2RGB);
Imgproc.cvtColor(currentFrame,currentFrame,Imgproc.COLOR_BGR2GRAY);

从这里开始创建字节数组:

  //初始化灰度字节数组
byte [] grayscaleBytes = new byte [data.length];

//提取灰度数据
currentFrame.get(0,0,grayscaleBytes);

//分配ByteBuffer
ByteBuffer buffer = ByteBuffer.allocateDirect(grayscaleBytes.length);

//换行灰度字节数组
buffer.wrap(grayscaleBytes);

//创建框架
//在
之前计算旋转Frame currentGoogleFrame = new Frame.Builder()。setImageData(buffer,currentFrame.cols(),currentFrame.rows( ),ImageFormat.NV21)。setRotation(rotation).build();

以这种方式构造框架会导致找不到面孔。但是,使用位图它按预期工作:

  if(bitmap == null){
//位图分配
bitmap = Bitmap.createBitmap(currentFrame.cols(),currentFrame.rows(),Bitmap.Config.ARGB_8888);
}

//复制灰度内容
org.opencv.android.Utils.matToBitmap(currentFrame,bitmap);

//缩小以提高性能
Matrix scaleMatrix = new Matrix();
scaleMatrix.postScale(scaleFactor,scaleFactor);

//在创建scaleBitmap之前回收
if(scaledBitmap!= null){
scaledBitmap.recycle();
}

//生成缩放位图
scaledBitmap = Bitmap.createBitmap(位图,0,0,bitmap.getWidth(),bitmap.getHeight(),rotationMatrix,true) ;

//创建框架
//仍然使用与之前相同的旋转
if(scaledBitmap!= null){
Frame currentGoogleFrame = new Frame.Builder( ).setBitmap(scaledBitmap).setRotation(旋转).build();
}


解决方案

检测需要2-3次秒是不典型的。使用CameraSource无需获得最佳性能您使用的是哪种硬件?你能提供更多细节吗?



人脸检测的一些方面是速度与准确性的权衡。



速度:


  1. 如果可能,请尝试使用较低分辨率的图像。例如,面部检测应该在640x480下正常工作。面部检测器代码在运行检测之前会对大图像进行下采样,但与接收较低分辨率的原始图像相比,这需要额外的时间。


  2. 使用ByteBuffers而不是Bitmaps将要快一点。第一部分应该只是一个灰度图像(没有颜色信息)。


  3. 如上所述,禁用地标和分类将使其更快。 / p>


  4. 在将来的版本中,将有一个min face size选项。将最小尺寸设置得更高可以使面部检测更快(在未检测到较小面部的精度权衡中)。


  5. 将模式设置为快速将让它更快(在没有检测到非正面的精确权衡时)。


  6. 使用仅限突出面孔选项会更快,但是它只能检测到一张大脸(至少是图像宽度的35%)。


准确性:


  1. 启用地标将允许更准确地计算姿势角。


  2. 将模式设置为准确将检测更广角度范围的面(例如,轮廓中的面)。但是,这需要更多时间。


  3. 缺少上面提到的最小面部大小选项,默认情况下只检测大于10%图像宽度的面。不会检测到较小的面孔。将来更改此设置将有助于检测较小的面部。但请注意,检测较小的面部需要更长的时间。


  4. 使用分辨率更高的图像将比分辨率较低的图像更准确。例如,如果图像为640x480,则可能会错过320x240图像中的某些面部。您设置的最小面部尺寸越低,检测该尺寸面部所需的分辨率越高。


  5. 确保旋转正确。例如,如果面部是颠倒的,则不会检测到面部。如果要检测倒置的脸,则应使用旋转的图像再次调用脸部检测器。


此外,垃圾如果您要创建大量位图,则收集时间可能是一个因素。使用ByteBuffer的一个优点是,您可以重复使用相同的缓冲区,而不会产生每个图像GC开销,如果您使用每个图像的位图,则会遇到这种开销。 CameraSource具有这一优势,因为它只使用几个缓冲区。


Right now, our application is running Snapdragon SDK successfully. We are trying to implement FaceDetector from Vision 8.3.0 on our project, in order to increase the number of compatible devices. We can't use CameraSource, as we rely on a custom camera + surface to provide certain functionality. We want to reuse as much code as possible, and Snapdragon SDK is doing amazingly with our current implementation.

Workflow is as follows:

1) Retrieve camera preview

2) Transform incoming byte array to bitmap (for some reason, we haven't managed to work with ByteBuffers. Image size, rotation and NV21 image format are provided and verified, but no faces are found). Bitmap is a global variable already initialized inside of processing thread, in order to avoid slowdowns from allocations.

3) Feed detector via receiveFrame

Results so far aren't good enough. Detection is way too slow (2-3 seconds) and inaccurate, even though we have disabled landmarks and classifications.

The question is: Is it possible to replicate CameraSource + Detector performance without using the former? Is is mandatory to use CameraSource to make it work with live input?

Thanks in advance!

EDIT

Following pm0733464 recommendations below, I'm trying to use ByteBuffer instead of Bitmap. This are the steps I follow:

// Initialize variables
// Mat is part of opencvSDK
Mat currentFrame = new Mat(cameraPreviewHeight + cameraPreviewHeight / 2, cameraPreviewWidth, CvType.CV_8UC1);
Mat yuvMat = new Mat(cameraPreviewHeight + cameraPreviewHeight / 2, cameraPreviewWidth, CvType.CV_8UC1);

// Load current frame
yuvMat.put(0, 0, data);

// Convert the frame to gray for better processing
Imgproc.cvtColor(yuvMat, currentFrame, Imgproc.COLOR_YUV420sp2RGB);
Imgproc.cvtColor(currentFrame, currentFrame, Imgproc.COLOR_BGR2GRAY); 

From here, the byte array creation:

// Initialize grayscale byte array
byte[] grayscaleBytes = new byte[data.length];

// Extract grayscale data
currentFrame.get(0, 0, grayscaleBytes);

// Allocate ByteBuffer
ByteBuffer buffer = ByteBuffer.allocateDirect(grayscaleBytes.length);

// Wrap grayscale byte array
buffer.wrap(grayscaleBytes);

// Create frame
// rotation is calculated before
Frame currentGoogleFrame = new Frame.Builder().setImageData(buffer, currentFrame.cols(), currentFrame.rows(), ImageFormat.NV21).setRotation(rotation).build();

Constructing frames this way results in no faces found. However, using bitmaps it works as expected:

if(bitmap == null) {
    // Bitmap allocation
    bitmap = Bitmap.createBitmap(currentFrame.cols(), currentFrame.rows(), Bitmap.Config.ARGB_8888);
}

// Copy grayscale contents
org.opencv.android.Utils.matToBitmap(currentFrame, bitmap);

// Scale down to improve performance
Matrix scaleMatrix = new Matrix();
scaleMatrix.postScale(scaleFactor, scaleFactor);

// Recycle before creating scaleBitmap
if(scaledBitmap != null) {
    scaledBitmap.recycle();
}

// Generate scaled bitmap
scaledBitmap = Bitmap.createBitmap(bitmap, 0, 0, bitmap.getWidth(), bitmap.getHeight(), rotationMatrix, true);

// Create frame
// The same rotation as before is still used
if(scaledBitmap != null) {
    Frame currentGoogleFrame = new Frame.Builder().setBitmap(scaledBitmap).setRotation(rotation).build();
}

解决方案

Having detection take 2-3 seconds isn't typical. Using CameraSource isn't necessary to get the best performance What hardware are you using? Can you provide more specifics?

Some aspects of face detection are speed vs. accuracy trade-offs.

Speed:

  1. Try using lower resolution images, if possible. Face detection should work fine at 640x480, for example. The face detector code does downsample large images before running detection, although this take additional time in comparison to receiving a lower resolution original.

  2. Using ByteBuffers rather than Bitmaps will be a bit faster. The first portion of this should be just a grayscale image (no color info).

  3. As you noted above, disabling landmarks and classification will make it faster.

  4. In a future release, there will be a "min face size" option. Setting the min size higher makes the face detection faster (at the accuracy trade-off of not detecting smaller faces).

  5. Setting the mode to "fast" will make it faster (at the accuracy trade-off of not detecting non-frontal faces).

  6. Using the "prominent face only" option will be faster, but it only detects a single large face (at least 35% the width of the image).

Accuracy:

  1. Enabling landmarks will allow the pose angles to be computed more accurately.

  2. Setting the mode to "accurate" will detect faces at a wider range of angles (e.g., faces in profile). However, this takes more time.

  3. Lacking the "min face size" option mentioned above, only faces larger than 10% the width of the image are detected by default. Smaller faces will not be detected. Changing this setting in the future will help to detect smaller faces. However, note that detecting smaller faces takes longer.

  4. Using a higher resolution image will be more accurate than a lower resolution image. For example, some faces in a 320x240 image might be missed that would have been detected if the image were 640x480. The lower the "min face size" you set, the higher the resolution you need to detect faces of that size.

  5. Make sure that you have the rotation right. The face won't be detected if it is upside down, for example. You should call the face detector again with a rotated image if you want to detect upside down faces.

Also, garbage collection time can be a factor if you're creating a lot of Bitmaps. An advantage of using ByteBuffer is that you can reuse the same buffer repeatedly without incurring per-image GC overhead that you would have encountered if you had used a Bitmap per image. CameraSource has this advantage, since it uses only a few buffers.

这篇关于谷歌移动视觉:没有CameraSource的FaceDetector表现不佳的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆