在Android上从CPU到GPU的最低开销摄像机 [英] Lowest overhead camera to CPU to GPU approach on android

查看:106
本文介绍了在Android上从CPU到GPU的最低开销摄像机的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的应用程序需要在CPU上的实时摄像机帧上进行一些处理,然后再在GPU上渲染它们. GPU上还渲染了其他一些东西,这取决于CPU处理的结果.因此,使所有内容保持同步非常重要,这样我们才能在GPU对该帧进行处理的结果也可用之前,不在GPU上渲染该帧本身.

My application needs to do some processing on live camera frames on the CPU, before rendering them on the GPU. There's also some other stuff being rendered on the GPU which is dependent on the results of the CPU processing; therefore it's important to keep everything synchronised so we don't render the frame itself on the GPU until the results of the CPU processing for that frame are also available.

问题是,在Android上最低的开销方法是什么?

The question is what's the lowest overhead approach for this on android?

在我的情况下,CPU处理仅需要灰度图像,因此,将Y平面压缩的YUV格式是理想的(并且也很容易与相机设备的本机格式匹配). NV12,NV21或全平面YUV都可以为灰阶提供理想的低开销访问,因此在CPU端将是首选.

The CPU processing in my case just needs a greyscale image, so a YUV format where the Y plane is packed is ideal (and tends to be a good match to the native format of the camera devices too). NV12, NV21 or fully planar YUV would all provide ideal low-overhead access to greyscale, so that would be preferred on the CPU side.

在原始相机API中,setPreviewCallbackWithBuffer()是将数据获取到CPU进行处理的唯一明智的方法. Y平面是分开的,因此非常适合CPU处理.使框架可用于OpenGL以低开销的方式进行渲染是更具挑战性的方面.最后,我编写了一个NEON颜色转换例程以输出RGB565,并仅使用glTexSubImage2d将其在GPU上可用.这首先是在Nexus 1时间范围内实现的,即使是320x240 glTexSubImage2d调用也要花费50ms的CPU时间(我想是可怜的驱动程序试图进行纹理模糊处理-我后来对系统进行了显着改进).

In the original camera API the setPreviewCallbackWithBuffer() was the only sensible way to get data onto the CPU for processing. This had the Y plane separate so was ideal for the CPU processing. Getting this frame available to OpenGL for rendering in a low overhead way was the more challenging aspect. In the end I wrote a NEON color conversion routine to output RGB565 and just use glTexSubImage2d to get this available on the GPU. This was first implemented in the Nexus 1 timeframe, where even a 320x240 glTexSubImage2d call took 50ms of CPU time (poor drivers trying to do texture swizzling I presume - this was significantly improved in a system update later on).

回想过去,我研究了eglImage扩展之类的东西,但是对于用户应用而言,它们似乎不可用或文档不足.我对内部android GraphicsBuffer类进行了一些研究,但理想情况下希望留在受支持的公共API的世界中.

Back in the day I looked into things like eglImage extensions, but they don't seem to be available or well documented enough for user apps. I had a little look into the internal android GraphicsBuffer classes but ideally want to stay in the world of supported public APIs.

android.hardware.camera2 API承诺能够将ImageReader和SurfaceTexture都附加到捕获会话.不幸的是,我在这里看不到任何确保正确的顺序管道的方法-推迟调用updateTexImage()直到CPU处理完毕很容易,但是如果在处理期间到达另一帧,则updateTexImage()将直接跳到最新的框架.看来,在有多个输出的情况下,理想情况下,我希望避免在每个队列中都有独立的帧副本.

The android.hardware.camera2 API had promise with being able to attach both an ImageReader and a SurfaceTexture to a capture session. Unfortunately I can't see any way of ensuring the right sequential pipeline here - holding off calling updateTexImage() until the CPU has processed is easy enough, but if another frame has arrived during that processing then updateTexImage() will skip straight to the latest frame. It also seems with multiple outputs there will be independent copies of the frames in each of the queues that ideally I'd like to avoid.

理想情况下,这就是我想要的:

Ideally this is what I'd like:

  1. 相机驱动程序会用最新的帧填充一些内存
  2. CPU获取指向内存中数据的指针,无需复制即可读取Y数据
  3. CPU处理数据并在框架准备就绪时在我的代码中设置一个标志
  4. 开始渲染框架时,请检查新框架是否准备就绪
  5. 调用一些API来绑定与GL纹理相同的内存
  6. 准备好新的帧后,将保存前一帧的缓冲区释放回池中

我看不到在Android上使用公开API完全做到那种零复制样式的方法,但是有什么可能最接近呢?

I can't see a way of doing exactly that zero-copy style with public API on android, but what's the closest that it's possible to get?

我尝试过的一件疯狂的事似乎可行,但是没有记录:ANativeWindow NDK API可以接受数据NV12格式,即使适当的格式常量不是公共头文件中的格式常量之一.这样可以通过memcpy()将NVText数据填充到SurfaceTexture中,从而避免了CPU端颜色转换以及glTexImage2d中驱动程序端发生的任何混乱.那仍然是数据的额外副本,尽管感觉它是不必要的,而且再次声明,因为未记录下来的数据可能无法在所有设备上使用.受支持的连续零拷贝Camera-> ImageReader-> SurfaceTexture或等效的版本将是完美的.

One crazy thing I tried that seems to work, but is not documented: The ANativeWindow NDK API can accept data NV12 format, even though the appropriate format constant is not one of the ones in the public headers. That allows a SurfaceTexture to be filled with NV12 data by memcpy() to avoid CPU-side colour conversion and any swizzling that happens driver side in glTexImage2d. That is still an extra copy of the data though that feels like it should be unnecessary, and again as it's undocumented might not work on all devices. A supported sequential zero-copy Camera -> ImageReader -> SurfaceTexture or equivalent would be perfect.

推荐答案

处理视频的最有效方法是完全避免使用CPU,但这听起来不是您的选择.尽管有一些RenderScript的路径,但公共API通常适合于在硬件中执行所有操作,因为这是框架本身所需要的. (假设您已经看过使用片段着色器的 Grafika过滤器演示. )

The most efficient way to process video is to avoid the CPU altogether, but it sounds like that's not an option for you. The public APIs are generally geared toward doing everything in hardware, since that's what the framework itself needs, though there are some paths for RenderScript. (I'm assuming you've seen the Grafika filter demo that uses fragment shaders.)

访问CPU上的数据,这些数据通常意味着较慢的Camera API或使用GraphicBuffer以及相对模糊的EGL函数(例如

Accessing the data on the CPU used to mean slow Camera APIs or working with GraphicBuffer and relatively obscure EGL functions (e.g. this question). The point of ImageReader was to provide zero-copy access to YUV data from the camera.

您不能真正序列化Camera-> ImageReader-> SurfaceTexture,因为ImageReader没有转发缓冲区" API.不幸的是,这会使它变得微不足道.您可以尝试使用EGL函数将缓冲区打包为外部纹理来复制SurfaceTexture的功能,但是您又进入了非公共的GraphicBuffer领域,我担心缓冲区的所有权/生命周期问题.

You can't really serialize Camera -> ImageReader -> SurfaceTexture as ImageReader doesn't have a "forward the buffer" API. Which is unfortunate, as that would make this trivial. You could try to replicate what SurfaceTexture does, using EGL functions to package the buffer as an external texture, but again you're into non-public GraphicBuffer-land, and I worry about ownership/lifetime issues of the buffer.

我不确定并行路径如何为您提供帮助(Camera2-> ImageReader,Camera2-> SurfaceTexture),因为发送到SurfaceTexture的内容不会进行任何修改. FWIW,它不涉及多余的副本-在Lollipop或其附近,BufferQueue已更新,以允许单个缓冲区在多个队列中移动.

I'm not sure how the parallel paths help you (Camera2 -> ImageReader, Camera2 -> SurfaceTexture), as what's being sent to the SurfaceTexture wouldn't have your modifications. FWIW, it doesn't involve an extra copy -- in Lollipop or thereabouts, BufferQueue was updated to allow individual buffers to move through multiple queues.

完全有可能我还没有看到一些新奇的API,但是据我所知,您的ANativeWindow方法可能是赢家.我怀疑您最好使用一种相机格式(YV12或NV21)而不是NV12,但是我不确定.

It's entirely possible there's some fancy new APIs I haven't seen yet, but from what I know your ANativeWindow approach is probably the winner. I suspect you'd be better off with one of the Camera formats (YV12 or NV21) than NV12, but I don't know for sure.

首先,如果处理时间太长,您将丢弃帧,但是除非处理不均衡(某些帧比其他帧花费的时间长得多),否则无论如何都必须丢弃帧.再次进入非公共API领域,您可以将SurfaceTexture切换为同步"模式,但是如果缓冲区已满,您仍然会丢帧.

FWIW, you will drop frames if your processing takes too long, but unless your processing is uneven (some frames take much longer than others) you'll have to drop frames no matter what. Getting into the realm of non-public APIs again, you could switch the SurfaceTexture to "synchronous" mode, but if your buffers fill up you're still dropping frames.

这篇关于在Android上从CPU到GPU的最低开销摄像机的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆