带有 ARkit 和 CoreML 的视觉框架 [英] Vision Framework with ARkit and CoreML

查看:17
本文介绍了带有 ARkit 和 CoreML 的视觉框架的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

虽然我一直在研究最佳实践并为正在进行的项目试验多个选项(即 Vuforia 中的 Unity3D iOS 项目与本机集成,使用 AVFoundation 提取帧,然后通过基于云的图像识别传递图像),但我来到了结论是我想使用 ARkit、Vision Framework 和 CoreML;让我解释一下.

While I have been researching best practices and experimenting multiple options for an ongoing project(i.e. Unity3D iOS project in Vuforia with native integration, extracting frames with AVFoundation then passing the image through cloud-based image recognition), I have come to the conclusion that I would like to use ARkit, Vision Framework, and CoreML; let me explain.

我想知道如何捕获 ARFrame,使用 Vision Framework 使用 CoreML 模型检测和跟踪给定对象.

I am wondering how I would be able to capture ARFrames, use the Vision Framework to detect and track a given object using a CoreML model.

此外,一旦识别出对象并能够在手势触摸时添加 AR 对象,那么拥有一个边界框会很不错,但这是可以在完成实体项目后实施的.

Additionally, it would be nice to have a bounding box once the object is recognized with the ability to add an AR object upon a gesture touch but this is something that could be implemented after getting the solid project down.

这无疑是可能的,但我不确定如何通过 Vision 将 ARFrame 传递给 CoreML 进行处理.

This is undoubtedly possible, but I am unsure of how to pass the ARFrames to CoreML via Vision for processing.

有什么想法吗?

推荐答案

更新: Apple 现在有一个 示例代码项目 执行其中一些步骤.继续阅读那些你仍然需要弄清楚自己的人......

Update: Apple now has a sample code project that does some of these steps. Read on for those you still need to figure out yourself...

几乎所有的部分都可以满足您的需求……您通常只需要将它们组合在一起即可.

Just about all of the pieces are there for what you want to do... you mostly just need to put them together.

您可以通过定期轮询ARSession 用于其 currentFrame 或将它们推送到您的会话委托.(如果您正在构建自己的渲染器,那就是 ARSessionDelegate; 如果您使用 ARSCNViewARSKView,它们的委托回调引用视图,因此您可以从那里返回到会话以获取 currentFrame 导致回调.)

You obtain ARFrames either by periodically polling the ARSession for its currentFrame or by having them pushed to your session delegate. (If you're building your own renderer, that's ARSessionDelegate; if you're working with ARSCNView or ARSKView, their delegate callbacks refer to the view, so you can work back from there to the session to get the currentFrame that led to the callback.)

ARFrame 提供当前的 capturedImageCVPixelBuffer 的形式.

ARFrame provides the current capturedImage in the form of a CVPixelBuffer.

您将图像传递给 Vision 以使用 VNImageRequestHandlerVNSequenceRequestHandler 类,它们都有将 CVPixelBuffer 作为输入图像进行处理的方法.

You pass images to Vision for processing using either the VNImageRequestHandler or VNSequenceRequestHandler class, both of which have methods that take a CVPixelBuffer as an input image to process.

  • You use the image request handler if you want to perform a request that uses a single image — like finding rectangles or QR codes or faces, or using a Core ML model to identify the image.
  • You use the sequence request handler to perform requests that involve analyzing changes between multiple images, like tracking an object's movement after you've identified it.

您可以在 WWDC17 中找到将图像传递给 Vision + Core ML 的通用代码Vision 上的会话,如果您观看该会话,现场演示还包括将 CVPixelBuffer 传递给 Vision.(在该演示中,他们从 AVCapture 获取像素缓冲区,但如果您从 ARKit 获取缓冲区,则视觉部分是相同的.)

You can find general code for passing images to Vision + Core ML attached to the WWDC17 session on Vision, and if you watch that session the live demos also include passing CVPixelBuffers to Vision. (They get pixel buffers from AVCapture in that demo, but if you're getting buffers from ARKit the Vision part is the same.)

您可能遇到的一个症结是识别/定位对象.人们使用 Core ML + Vision 的大多数对象识别"模型(包括 Apple 在其 ML 开发者页面) 是场景分类器.也就是说,他们看着一张图片说,这是一张 (thing) 的图片",而不是这张图片中有一个 (thing)",位于(边界框)".

One sticking point you're likely to have is identifying/locating objects. Most "object recognition" models people use with Core ML + Vision (including those that Apple provides pre-converted versions of on their ML developer page) are scene classifiers. That is, they look at an image and say, "this is a picture of a (thing)," not something like "there is a (thing) in this picture, located at (bounding box)".

Vision 提供了简单的 API 来处理分类器——你请求的结果数组用 填充VNClassificationObservation 告诉您场景是什么(或可能是",具有置信度)的对象.

Vision provides easy API for dealing with classifiers — your request's results array is filled in with VNClassificationObservation objects that tell you what the scene is (or "probably is", with a confidence rating).

如果你找到或训练一个既能识别又能定位物体的模型——我必须强调,球在你的球场上——使用 Vision 将导致 VNCoreMLFeatureValueObservation 对象.这些有点像任意键值对,因此从这些对象中识别对象的确切方式取决于您如何构建和标记模型的输出.

If you find or train a model that both identifies and locates objects — and for that part, I must stress, the ball is in your court — using Vision with it will result in VNCoreMLFeatureValueObservation objects. Those are sort of like arbitrary key-value pairs, so exactly how you identify an object from those depends on how you structure and label the outputs from your model.

如果您正在处理 Vision 已经知道如何识别的东西,而不是使用您自己的模型(例如人脸和二维码),您可以使用 Vision 的 API 获取图像框架中的位置.

If you're dealing with something that Vision already knows how to recognize, instead of using your own model — stuff like faces and QR codes — you can get the locations of those in the image frame with Vision's API.

如果在 2D 图像中定位对象后,您想在 AR 中显示与其关联的 3D 内容(或显示 2D 内容,但使用 ARKit 将所述内容定位在 3D 中),您需要 hit test 那些 2D 图像指向 3D 世界.

If after locating an object in the 2D image, you want to display 3D content associated with it in AR (or display 2D content, but with said content positioned in 3D with ARKit), you'll need to hit test those 2D image points against the 3D world.

一旦你到了这一步,通过命中测试放置 AR 内容已经在其他地方得到了很好的介绍,由 Apple社区 提供.

Once you get to this step, placing AR content with a hit test is something that's already pretty well covered elsewhere, both by Apple and the community.

这篇关于带有 ARkit 和 CoreML 的视觉框架的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆