如何在Android中有效地即时操纵YUV相机框架? [英] How to manipulate on the fly YUV Camera frame efficiently in Android?

查看:74
本文介绍了如何在Android中有效地即时操纵YUV相机框架?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在线程中从Android CameraPreview回调获得的NV21帧的感兴趣区域(中心)周围添加了黑色(0)填充.

为避免转换为RGB/位图和反向操作的开销,我尝试直接操作NV21字节数组,但这涉及嵌套循环,这也使预览/处理变慢.

这是我的run()方法,在调用方法 blackNonROI 之后将帧发送到检测器.

public void run() {
    Frame outputFrame;
    ByteBuffer data;
    while (true) {
        synchronized (mLock) {

            while (mActive && (mPendingFrameData == null))
                try{ mLock.wait(); }catch(InterruptedException e){ return; }

            if (!mActive) { return; }

            // Region of Interest
            mPendingFrameData = blackNonROI(mPendingFrameData.array(),mPreviewSize.getWidth(),mPreviewSize.getHeight(),300,300);

            outputFrame = new Frame.Builder().setImageData(mPendingFrameData, mPreviewSize.getWidth(),mPreviewSize.getHeight(), ImageFormat.NV21).setId(mPendingFrameId).setTimestampMillis(mPendingTimeMillis).setRotation(mRotation).build();

            data = mPendingFrameData;
            mPendingFrameData = null;

        }

        try {
            mDetector.receiveFrame(outputFrame);
        } catch (Throwable t) {
        } finally {
            mCamera.addCallbackBuffer(data.array());
        }
    }
}

以下是方法 blackNonROI

private ByteBuffer blackNonROI(byte[] yuvData, int width, int height, int roiWidth, int roiHeight){

    int hozMargin = (width - roiWidth) / 2;
    int verMargin = (height - roiHeight) / 2;

    // top/bottom of center
    for(int x=0; x<width; x++){
        for(int y=0; y<verMargin; y++)
            yuvData[y * width + x] = 0;
        for(int y=height-verMargin; y<height; y++)
            yuvData[y * width + x] = 0;
    }

    // left/right of center
    for(int y=verMargin; y<height-verMargin; y++){
        for (int x = 0; x < hozMargin; x++)
            yuvData[y * width + x] = 0;
        for (int x = width-hozMargin; x < width; x++)
            yuvData[y * width + x] = 0;
    }

    return ByteBuffer.wrap(yuvData);
}

示例输出框架

请注意,我不会裁剪图像,只是在图像的指定中心周围填充黑色像素以保持协调以进行进一步的活动.此操作应按预期进行,但速度不够快,并导致预览和帧处理方面的滞后.

  1. 我可以进一步改善字节数组更新吗?
  2. 打电话给 blackNonROI 的时间/地点还可以吗?
  3. /lib是否有其他更有效的方法?
  4. 我的简单像素迭代是如此之慢,YUV/位图库如何如此快地完成复杂的事情?他们使用GPU吗?

我已经用下面的代码替换了两个for循环,并且现在已经非常快了(有关详细信息,请参阅greeble31的答案):

    // full top padding
    from = 0;
    to = (verMargin-1)*width + width;
    Arrays.fill(yuvData,from,to,(byte)1);

    // full bottom padding
    from = (height-verMargin)*width;
    to = (height-1)*width + width;
    Arrays.fill(yuvData,from,to,(byte)1);

    for(int y=verMargin; y<height-verMargin; y++) {
        // left-middle padding
        from = y*width;
        to = y*width + hozMargin;
        Arrays.fill(yuvData,from,to,(byte)1);

        // right-middle padding
        from = y*width + width-hozMargin;
        to = y*width + width;
        Arrays.fill(yuvData,from,to,(byte)1);
    }

解决方案

1.是.要了解为什么,让我们看一下Android Studio为您的中心的左/右"嵌套循环生成的字节码:

(摘录自AS 3.2.1版blackNonROI的发行版):

:goto_27
sub-int v2, p2, p4         ;for(int y=verMargin; y<height-verMargin; y++)
if-ge v1, v2, :cond_45
const/4 v2, 0x0
:goto_2c
if-ge v2, p3, :cond_36     ;for (int x = 0; x < hozMargin; x++)
mul-int v3, v1, p1
add-int/2addr v3, v2
.line 759
aput-byte v0, p0, v3
add-int/lit8 v2, v2, 0x1
goto :goto_2c
:cond_36
sub-int v2, p1, p3 
:goto_38
if-ge v2, p1, :cond_42     ;for (int x = width-hozMargin; x < width; x++)
mul-int v3, v1, p1
add-int/2addr v3, v2
.line 761
aput-byte v0, p0, v3
add-int/lit8 v2, v2, 0x1
goto :goto_38
:cond_42
add-int/lit8 v1, v1, 0x1
goto :goto_27
.line 764
:cond_45                   ;all done with the for loops!

无需费心逐行解释整个事情,很明显,每个小的内部循环都在执行:

  • 1个比较
  • 1个整数乘法
  • 添加1个
  • 1家商店
  • 1个转到

很多,当您考虑将内部循环真正需要做的全部事情就是将一定数量的连续数组元素设置为0时.

此外,其中一些字节码需要实现多个机器指令,因此如果您要查看20个以上的周期,而只对一个内部循环执行一次迭代,我不会感到惊讶. (我没有测试过该代码由Dalvik VM编译后的样子,但是我真诚地怀疑它是否足够聪明,可以优化这些循环中的乘法运算.)

可能的解决方法

您可以通过消除一些多余的计算来提高性能.例如,每个内部循环都重新计算y * width 每次.相反,您可以预先计算该偏移量,将其存储在局部变量中(在外部循环中),然后在计算索引时使用它.

当性能绝对至关重要时,有时我会在本机代码中进行这种缓冲区操作.如果您可以合理确定mPendingFrameData DirectByteBuffer ,这是一个更具吸引力的选择.缺点是1.)较高的复杂性,以及2.)如果出现问题/崩溃,则安全网"较少.

最合适的修补程序

对于您而言,最合适的解决方案可能只是使用Arrays.fill(),它更有可能以优化的方式实现.

请注意,顶部和底部块是连续的大块内存,并且每个块可以由一个Arrays.fill()处理:

Arrays.fill(yuvData, 0, verMargin * width, 0);   //top
Arrays.fill(yuvData, width * height - verMargin * width, width * height, 0);    //bottom

然后可以像这样处理双方:

for(int y=verMargin; y<height-verMargin; y++){
    int offset = y * width;
    Arrays.fill(yuvData, offset, offset + hozMargin, 0);  //left
    Arrays.fill(yuvData, offset + width, offset + width - hozMargin, 0);   //right
}

这里有更多的优化机会,但是我们已经处在收益递减的时刻.例如,由于每行的结尾都与下一个(在内存中)的开头相邻,因此您实际上可以将两个较小的fill()调用合并为一个较大的调用,该调用既覆盖第N行的右侧,又覆盖左侧N + 1行的一侧.依此类推.

2..不确定.如果您的预览显示时没有任何损坏/撕裂,那么从线程安全的角度来看,它可能是一个 safe 调用函数的地方,因此它可能和其他地方一样好.

3和4.可能有一些库可以执行此任务;对于基于Java的NV21框架,我不知道有什么用处.您必须进行一些格式转换,但我认为这不值得.我认为,使用GPU进行这项工作是过度的过度优化,但它可能适合某些专门的应用程序.在考虑使用GPU之前,我会考虑使用JNI(本机代码).

我认为您选择直接对NV21进行操作而不是转换为位图的选择是一个不错的选择(考虑到您的需求以及该任务足够简单以避免需要图形库的事实).

I'm adding a black (0) padding around Region of interest (center) of NV21 frame got from Android CameraPreview callbacks in a thread.

To avoid overhead of conversion to RGB/Bitmap and reverse, I'm trying to manipulate NV21 byte array directly but this involves nested loops which is also making preview/processing slow.

This is my run() method sending frames to detector after calling method blackNonROI.

public void run() {
    Frame outputFrame;
    ByteBuffer data;
    while (true) {
        synchronized (mLock) {

            while (mActive && (mPendingFrameData == null))
                try{ mLock.wait(); }catch(InterruptedException e){ return; }

            if (!mActive) { return; }

            // Region of Interest
            mPendingFrameData = blackNonROI(mPendingFrameData.array(),mPreviewSize.getWidth(),mPreviewSize.getHeight(),300,300);

            outputFrame = new Frame.Builder().setImageData(mPendingFrameData, mPreviewSize.getWidth(),mPreviewSize.getHeight(), ImageFormat.NV21).setId(mPendingFrameId).setTimestampMillis(mPendingTimeMillis).setRotation(mRotation).build();

            data = mPendingFrameData;
            mPendingFrameData = null;

        }

        try {
            mDetector.receiveFrame(outputFrame);
        } catch (Throwable t) {
        } finally {
            mCamera.addCallbackBuffer(data.array());
        }
    }
}

Following is the method blackNonROI

private ByteBuffer blackNonROI(byte[] yuvData, int width, int height, int roiWidth, int roiHeight){

    int hozMargin = (width - roiWidth) / 2;
    int verMargin = (height - roiHeight) / 2;

    // top/bottom of center
    for(int x=0; x<width; x++){
        for(int y=0; y<verMargin; y++)
            yuvData[y * width + x] = 0;
        for(int y=height-verMargin; y<height; y++)
            yuvData[y * width + x] = 0;
    }

    // left/right of center
    for(int y=verMargin; y<height-verMargin; y++){
        for (int x = 0; x < hozMargin; x++)
            yuvData[y * width + x] = 0;
        for (int x = width-hozMargin; x < width; x++)
            yuvData[y * width + x] = 0;
    }

    return ByteBuffer.wrap(yuvData);
}

Example output frame

Note that I'm not cropping the image, just padding black pixels around specified center of image to maintain coordinated for further activities. This works like it should but it's not fast enough and causing lag in preview and frames processing.

  1. Can I further improve byte array update?
  2. Is time/place for calling blackNonROI fine?
  3. Any other way / lib for doing it more efficiently?
  4. My simple pixel iteration is so slow, how YUV/Bitmap libraries do complex things so fast? do they use GPU?

Edit:

I've replaced both for loops with following code, and it's pretty much fast now (Please refer to greeble31's answer for details):

    // full top padding
    from = 0;
    to = (verMargin-1)*width + width;
    Arrays.fill(yuvData,from,to,(byte)1);

    // full bottom padding
    from = (height-verMargin)*width;
    to = (height-1)*width + width;
    Arrays.fill(yuvData,from,to,(byte)1);

    for(int y=verMargin; y<height-verMargin; y++) {
        // left-middle padding
        from = y*width;
        to = y*width + hozMargin;
        Arrays.fill(yuvData,from,to,(byte)1);

        // right-middle padding
        from = y*width + width-hozMargin;
        to = y*width + width;
        Arrays.fill(yuvData,from,to,(byte)1);
    }

解决方案

1. Yes. To understand why, let's take a look at the bytecode Android Studio produces for your "left/right of center" nested loop:

(Annotated excerpt from a release build of blackNonROI, AS 3.2.1):

:goto_27
sub-int v2, p2, p4         ;for(int y=verMargin; y<height-verMargin; y++)
if-ge v1, v2, :cond_45
const/4 v2, 0x0
:goto_2c
if-ge v2, p3, :cond_36     ;for (int x = 0; x < hozMargin; x++)
mul-int v3, v1, p1
add-int/2addr v3, v2
.line 759
aput-byte v0, p0, v3
add-int/lit8 v2, v2, 0x1
goto :goto_2c
:cond_36
sub-int v2, p1, p3 
:goto_38
if-ge v2, p1, :cond_42     ;for (int x = width-hozMargin; x < width; x++)
mul-int v3, v1, p1
add-int/2addr v3, v2
.line 761
aput-byte v0, p0, v3
add-int/lit8 v2, v2, 0x1
goto :goto_38
:cond_42
add-int/lit8 v1, v1, 0x1
goto :goto_27
.line 764
:cond_45                   ;all done with the for loops!

Without bothering to decipher this whole thing line-by-line, it is clear that each of your small, inner loops is performing:

  • 1 comparison
  • 1 integer multiplication
  • 1 addition
  • 1 store
  • 1 goto

That's a lot, when you consider that all that you really need this inner loop to do is set a certain number of successive array elements to 0.

Moreover, some of these bytecodes require multiple machine instructions to implement, so I wouldn't be surprised if you're looking at over 20 cycles, just to do a single iteration of one of the inner loops. (I haven't tested what this code looks like once it's compiled by the Dalvik VM, but I sincerely doubt it is smart enough to optimize the multiplications out of these loops.)

POSSIBLE FIXES

You could improve performance by eliminating some redundant calculations. For example, each inner loop is recalculating y * width each time. Instead, you could pre-calculate that offset, store it in a local variable (in the outer loop), and use that when calculating the indices.

When performance is absolutely critical, I will sometimes do this sort of buffer manipulation in native code. If you can be reasonably certain that mPendingFrameData is a DirectByteBuffer, this is an even more attractive option. The disadvantages are 1.) higher complexity, and 2.) less of a "safety net" if something goes wrong/crashes.

MOST APPROPRIATE FIX

In your case, the most appropriate solution is probably just to use Arrays.fill(), which is more likely to be implemented in an optimized way.

Note that the top and bottom blocks are big, contiguous chunks of memory, and can be handled by one Arrays.fill() each:

Arrays.fill(yuvData, 0, verMargin * width, 0);   //top
Arrays.fill(yuvData, width * height - verMargin * width, width * height, 0);    //bottom

And then the sides could be handled something like this:

for(int y=verMargin; y<height-verMargin; y++){
    int offset = y * width;
    Arrays.fill(yuvData, offset, offset + hozMargin, 0);  //left
    Arrays.fill(yuvData, offset + width, offset + width - hozMargin, 0);   //right
}

There are more opportunities for optimization, here, but we're already at the point of diminishing returns. For example, since the end of each row of is adjacent to the start of the next one (in memory), you could actually combine two smaller fill() calls into a larger one that covers both the right side of row N and the left side of row N + 1. And so forth.

2. Not sure. If your preview is displaying without any corruption/tearing, then it's probably a safe place to call the function from (from a thread safety standpoint), and is therefor probably as good a place as any.

3 and 4. There could be libraries for doing this task; I don't know of any offhand, for Java-based NV21 frames. You'd have to do some format conversions, and I don't think it's be worth it. Using a GPU to do this work is excessive over-optimization, in my opinion, but it may be appropriate for some specialized applications. I'd consider going to JNI (native code) before I'd ever consider using the GPU.

I think your choice to do the manipulation directly to the NV21, instead of converting to a bitmap, is a good one (considering your needs and the fact that the task is simple enough to avoid needing a graphics library).

这篇关于如何在Android中有效地即时操纵YUV相机框架?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆