iOS 还原相机投影 [英] iOS revert camera projection

查看：23 发布时间：2021/11/17 21:21:06 augmented-reality scenekit arkit coreml apple-vision

本文介绍了iOS 还原相机投影的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试估计与空间中的二维码相关的设备位置.我正在使用 iOS11 中引入的 ARKit 和 Vision 框架，但这个问题的答案可能并不取决于它们.

使用 Vision 框架，我能够在相机框架中获取限定二维码的矩形.我想将此矩形与从标准位置转换二维码所需的设备平移和旋转相匹配.

例如，如果我观察框架:

* *乙C一种D* *

如果我距离二维码 1m，以二维码为中心，假设二维码的边长为 10cm，我会看到:

* *A0 B0D0 C0* *

这两个帧之间的设备转换是什么?我知道可能无法得出准确的结果，因为观察到的 QR 码可能略微非平面，而我们正试图对不完美的事物估计仿射变换.

我猜 sceneView.pointOfView?.camera?.projectionTransform 比 sceneView.pointOfView?.camera?.projectionTransform?.camera.projectionMatrix 更有帮助，因为后来已经考虑了从 ARKit 推断出的变换，我对此问题不感兴趣.

我将如何填写

func get 变换(qrCodeRectangle:VNBarcodeObservation，相机变换:SCNMatrix4) {//qrCodeRectangle.topLeft etc是A0的[0, 1] * [0, 1]中的位置//二维码在参考坐标系中的预期真实世界位置让 a0 = SCNVector3(x: -0.05, y: 0.05, z: 1)让 b0 = SCNVector3(x: 0.05, y: 0.05, z: 1)让 c0 = SCNVector3(x: 0.05, y: -0.05, z: 1)让 d0 = SCNVector3(x: -0.05, y: -0.05, z: 1)让 A0, B0, C0, D0 = ??//CGPoints 表示位置//面向 Z+ 的 0, 0, 0 中相机的相机帧//然后得到从 0, 0, 0 到当前位置/旋转看到的变换//a0, b0, c0, d0 通过相机作为 qrCodeRectangle}

====编辑====

在尝试了很多事情之后，我最终使用 openCV 投影和透视解算器进行了相机姿势估计，solvePnP 这给了我一个旋转和平移，它应该代表 QR 代码参考中的相机姿势.然而，当使用这些值并放置与逆变换相对应的对象时，QR 码应该在相机空间中，我得到不准确的移位值，并且我无法使旋转工作:

//下面是一些伪代码func 渲染器(_ 发送者:SCNSceneRenderer，updateAtTime 时间:TimeInterval){守卫 let currentFrame = sceneView.session.currentFrame, let pov = sceneView.pointOfView else { return }let intrisics = currentFrame.camera.intrinsics让 QRCornerCoordinatesInQRRef = [(-0.05, -0.05, 0), (0.05, -0.05, 0), (-0.05, 0.05, 0), (0.05, 0.05, 0)]//使用 VNDetectBarcodesRequest 查找二维码并返回一个边界矩形守卫让 qr = findQRCode(in: currentFrame) else { return }让 imageSize = CGSize(宽度:CVPixelBufferGetWidth(currentFrame.capturedImage),高度:CVPixelBufferGetHeight(currentFrame.capturedImage))让观察 = [qr.bottomLeft，qr.bottomRight，qr.topLeft，qr.topRight，].map({ (imageSize.height * (1 - $0.y), imageSize.width * $0.x) })//图片和SceneKit协调的不一样//将其替换为://(imageSize.height * (1.35 - $0.y), imageSize.width * ($0.x - 0.2))//奇怪地修复了一个问题，见下文让旋转，平移 = openCV.solvePnP(QRCornerCoordinatesInQRRef，观察，内部)//调用 openCV solvePnP 并得到结果让 positionInCameraRef = -rotation.inverted * 平移让节点 = SCNNode(几何:someGeometry)pov.addChildNode(节点)node.position = 翻译node.orientation = rotation.asQuaternion}

这是输出:

其中 A、B、C、D 是按传递给程序的顺序排列的二维码角.

当手机旋转时，预测的原点保持不变，但它偏离了它应该在的位置.令人惊讶的是，如果我改变观察值，我能够纠正这个:

//(imageSize.height * (1 - $0.y), imageSize.width * $0.x)//取而代之:(imageSize.height * (1.35 - $0.y), imageSize.width * ($0.x - 0.2))

现在预测的原点保持稳定.但是我不明白移位值从何而来.

最后，我尝试获得相对于 QR 码参考的方向:

 var n = SCNNode(geometry: redGeometry)node.addChildNode(n)n.position = SCNVector3(0.1, 0, 0)n = SCNNode(几何:blueGeometry)node.addChildNode(n)n.position = SCNVector3(0, 0.1, 0)n = SCNNode(几何:greenGeometry)node.addChildNode(n)n.position = SCNVector3(0, 0, 0.1)

当我直视 QR 码时，方向很好，但随后它会发生一些似乎与手机旋转有关的变化:

我的悬而未决的问题是:

如何解决旋转问题?
位置偏移值从何而来?
旋转、平移、QRCornerCoordinatesInQRRef、观察、内部特性验证了什么简单的关系?是 O ~ K^-1 * (R_3x2 | T) Q 吗?因为如果是这样，那就差了几个数量级.

如果有帮助，这里有一些数值:

内部矩阵垫子 3x31090.318, 0.000, 618.6610.000、1090.318、359.6160.000, 0.000, 1.000图片尺寸1280.0、720.0屏幕尺寸414.0、736.0

==== 编辑 2 ====

我注意到当手机与二维码保持水平平行时旋转工作正常(即旋转矩阵是 [[a, 0, b], [0, 1, 0], [c, 0,d]])，无论实际二维码方向是什么:

其他轮换无效.

解决方案

坐标系对应

考虑到 Vision/CoreML 坐标系与 ARKit/SceneKit 坐标系不对应.有关详细信息，请参阅

ARKit(以及SceneKit 和Vision)中的世界坐标空间始终遵循右手约定(正Y 轴指向上方，正>Z 轴指向查看器，正 X 轴指向查看器的右侧)，但方向取决于会话的配置.相机在本地坐标空间中工作.

绕任意轴的旋转方向为正(逆时针)和负(顺时针).对于在 ARKit 和 Vision 中进行跟踪，这一点至关重要.

轮换顺序也有意义.ARKit 以及 SceneKit 以组件的相反顺序应用相对于节点的枢轴属性的旋转:首先 roll(关于 Z 轴)，然后 yaw(关于Y 轴)，然后是pitch(关于X 轴).所以轮换顺序是ZYX.

I'm trying to estimate my device position related to a QR code in space. I'm using ARKit and the Vision framework, both introduced in iOS11, but the answer to this question probably doesn't depend on them.

With the Vision framework, I'm able to get the rectangle that bounds a QR code in the camera frame. I'd like to match this rectangle to the device translation and rotation necessary to transform the QR code from a standard position.

For instance if I observe the frame:

*            *

    B
          C
  A
       D


*            *

while if I was 1m away from the QR code, centered on it, and assuming the QR code has a side of 10cm I'd see:

*            *


    A0  B0

    D0  C0


*            *

what has been my device transformation between those two frames? I understand that an exact result might not be possible, because maybe the observed QR code is slightly non planar and we're trying to estimate an affine transform on something that is not one perfectly.

I guess the sceneView.pointOfView?.camera?.projectionTransform is more helpful than the sceneView.pointOfView?.camera?.projectionTransform?.camera.projectionMatrix since the later already takes into account transform inferred from the ARKit that I'm not interested into for this problem.

How would I fill

func get transform(
  qrCodeRectangle: VNBarcodeObservation,
  cameraTransform: SCNMatrix4) {
  // qrCodeRectangle.topLeft etc is the position in [0, 1] * [0, 1] of A0

  // expected real world position of the QR code in a referential coordinate system
  let a0 = SCNVector3(x: -0.05, y: 0.05, z: 1)
  let b0 = SCNVector3(x: 0.05, y: 0.05, z: 1)
  let c0 = SCNVector3(x: 0.05, y: -0.05, z: 1)
  let d0 = SCNVector3(x: -0.05, y: -0.05, z: 1)

  let A0, B0, C0, D0 = ?? // CGPoints representing position in
                          // camera frame for camera in 0, 0, 0 facing Z+

  // then get transform from 0, 0, 0 to current position/rotation that sees
  // a0, b0, c0, d0 through the camera as qrCodeRectangle 
}

====Edit====

After trying number of things, I ended up going for camera pose estimation using openCV projection and perspective solver, solvePnP This gives me a rotation and translation that should represent the camera pose in the QR code referential. However when using those values and placing objects corresponding to the inverse transformation, where the QR code should be in the camera space, I get inaccurate shifted values, and I'm not able to get the rotation to work:

// some flavor of pseudo code below
func renderer(_ sender: SCNSceneRenderer, updateAtTime time: TimeInterval) {
  guard let currentFrame = sceneView.session.currentFrame, let pov = sceneView.pointOfView else { return }
  let intrisics = currentFrame.camera.intrinsics
  let QRCornerCoordinatesInQRRef = [(-0.05, -0.05, 0), (0.05, -0.05, 0), (-0.05, 0.05, 0), (0.05, 0.05, 0)]

  // uses VNDetectBarcodesRequest to find a QR code and returns a bounding rectangle
  guard let qr = findQRCode(in: currentFrame) else { return }

  let imageSize = CGSize(
    width: CVPixelBufferGetWidth(currentFrame.capturedImage),
    height: CVPixelBufferGetHeight(currentFrame.capturedImage)
  )

  let observations = [
    qr.bottomLeft,
    qr.bottomRight,
    qr.topLeft,
    qr.topRight,
  ].map({ (imageSize.height * (1 - $0.y), imageSize.width * $0.x) })
  // image and SceneKit coordinated are not the same
  // replacing this by:
  // (imageSize.height * (1.35 - $0.y), imageSize.width * ($0.x - 0.2))
  // weirdly fixes an issue, see below

  let rotation, translation = openCV.solvePnP(QRCornerCoordinatesInQRRef, observations, intrisics)
  // calls openCV solvePnP and get the results

  let positionInCameraRef = -rotation.inverted * translation
  let node = SCNNode(geometry: someGeometry)
  pov.addChildNode(node)
  node.position = translation
  node.orientation = rotation.asQuaternion
}

Here is the output:

where A, B, C, D are the QR code corners in the order they are passed to the program.

The predicted origin stays in place when the phone rotates, but it's shifted from where it should be. Surprisingly, if I shift the observations values, I'm able to correct this:

  // (imageSize.height * (1 - $0.y), imageSize.width * $0.x)
  // replaced by:
  (imageSize.height * (1.35 - $0.y), imageSize.width * ($0.x - 0.2))

and now the predicted origin stays robustly in place. However I don't understand where the shift values come from.

Finally, I've tried to get an orientation fixed relatively to the QR code referential:

    var n = SCNNode(geometry: redGeometry)
    node.addChildNode(n)
    n.position = SCNVector3(0.1, 0, 0)
    n = SCNNode(geometry: blueGeometry)
    node.addChildNode(n)
    n.position = SCNVector3(0, 0.1, 0)
    n = SCNNode(geometry: greenGeometry)
    node.addChildNode(n)
    n.position = SCNVector3(0, 0, 0.1)

The orientation is fine when I look at the QR code straight, but then it shifts by something that seems to be related to the phone rotation:

Outstanding questions I have are:

How do I solve the rotation?
where do the position shift values come from?
What simple relationship do rotation, translation, QRCornerCoordinatesInQRRef, observations, intrisics verify? Is it O ~ K^-1 * (R_3x2 | T) Q ? Because if so that's off by a few order of magnitude.

If that's helpful, here are a few numerical values:

Intrisics matrix
Mat 3x3
1090.318, 0.000, 618.661
0.000, 1090.318, 359.616
0.000, 0.000, 1.000

imageSize
1280.0, 720.0
screenSize
414.0, 736.0

==== Edit2 ====

I've noticed that the rotation works fine when the phone stays horizontally parallel to the QR code (ie the rotation matrix is [[a, 0, b], [0, 1, 0], [c, 0, d]]), no matter what the actual QR code orientation is:

Other rotation don't work.

解决方案

Coordinate systems' correspondence

Take into consideration that Vision/CoreML coordinate system doesn't correspond to ARKit/SceneKit coordinate system. For details look at this post.

Rotation's direction

I suppose the problem is not in matrix. It's in vertices placement. For tracking 2D images you need to place ABCD vertices counter-clockwise (the starting point is A vertex located in imaginary origin x:0, y:0). I think Apple Documentation on VNRectangleObservation class (info about projected rectangular regions detected by an image analysis request) is vague. You placed your vertices in the same order as is in official documentation:

var bottomLeft: CGPoint
var bottomRight: CGPoint
var topLeft: CGPoint
var topRight: CGPoint

But they need to be placed the same way like positive rotation direction (about Z axis) occurs in Cartesian coordinates system:

World Coordinate Space in ARKit (as well as in SceneKit and Vision) always follows a right-handed convention (the positive Y axis points upward, the positive Z axis points toward the viewer and the positive X axis points toward the viewer's right), but is oriented based on your session's configuration. Camera works in Local Coordinate Space.

Rotation direction about any axis is positive (Counter-Clockwise) and negative (Clockwise). For tracking in ARKit and Vision it's critically important.

The order of rotation also makes sense. ARKit, as well as SceneKit, applies rotation relative to the node’s pivot property in the reverse order of the components: first roll (about Z axis), then yaw (about Y axis), then pitch (about X axis). So the rotation order is ZYX.

这篇关于iOS 还原相机投影的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

iOS 还原相机投影 [英] iOS revert camera projection

问题描述

坐标系对应

Coordinate systems' correspondence

Rotation's direction

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

iOS 还原相机投影 [英] iOS revert camera projection

问题描述

坐标系对应

Coordinate systems' correspondence

Rotation's direction

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭