通过单应性或带有solvePnP()函数的相机姿态估计 [英] Camera pose estimation from homography or with solvePnP() function

查看:111
本文介绍了通过单应性或带有solvePnP()函数的相机姿态估计的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在照片上构建静态增强现实场景,并在平面上共面点与图像之间定义了4种对应关系.

I'm trying to build static augmented reality scene over a photo with 4 defined correspondences between coplanar points on a plane and image.

以下是分步流程:

  1. 用户使用设备的相机添加图像.假设它包含一个以某种角度捕获的矩形.
  2. 用户定义矩形的物理尺寸,该矩形位于水平面(以SceneKit表示为YOZ).假设它的中心是世界的原点(0,0,0),因此我们可以轻松地找到每个角的(x,y,z).
  3. 用户在图像坐标系中为矩形的每个角定义uv坐标.
  4. SceneKit场景是用相同大小的矩形创建的,并且可以在相同的角度看到.
  5. 可以在场景中添加和移动其他节点.

我还测量了iPhone相机相对于A4纸中心的位置.因此,此拍摄位置的位置是(0,14,42.5),以厘米为单位.而且我的iPhone稍微倾斜到桌子上(5-10度)

I've also measured position of the iphone camera relatively to the center of the A4 paper. So for this shot the position was (0, 14, 42.5) measured in cm. Also my iPhone was slightly pitched to the table (5-10 degrees)

使用此数据,我设置了SCNCamera,以在第三张图像上获得所需的蓝色平面透视图:

Using this data I've set up SCNCamera to get the desired perspective of the blue plane on the third image:

let camera = SCNCamera()
camera.xFov = 66
camera.zFar = 1000
camera.zNear = 0.01

cameraNode.camera = camera
cameraAngle = -7 * CGFloat.pi / 180
cameraNode.rotation = SCNVector4(x: 1, y: 0, z: 0, w: Float(cameraAngle))
cameraNode.position = SCNVector3(x: 0, y: 14, z: 42.5)

这将为我提供比较结果的参考.

This will give me a reference to compare my result with.

为了使用SceneKit构建AR,我需要:

  1. 调整SCNCamera的视野,使其与真实相机的视野匹配.
  2. 使用世界点(x,0,z)和图像点(u,v)之间的4种对应关系来计算相机节点的位置和旋转

H -单应性; K -内部矩阵; [R | t] -外在矩阵

H - homography; K - Intrinsic matrix; [R | t] - Extrinsic matrix

我尝试了两种方法来查找相机的变换矩阵:使用OpenCV中的solvePnP和基于4个共面点的单应性手动计算.

I tried two approaches in order to find transform matrix for camera: using solvePnP from OpenCV and manual calculation from homography based on 4 coplanar points.

1.找出单应性

此步骤已成功完成,因为世界起源的UV坐标似乎是正确的.

This step is done successfully, since UV coordinates of world's origin seems to be correct.

2.内在矩阵

为了获取iPhone 6的固有矩阵,我使用了应用,在640 * 480分辨率的100张图像中,我得到了以下结果:

In order to get intrinsic matrix of iPhone 6, I have used this app, which gave me the following result out of 100 images of 640*480 resolution:

假设输入图像的宽高比为4:3,我可以根据分辨率缩放上述矩阵

Assuming that input image has 4:3 aspect ratio, I can scale the above matrix depending on resolution

我不确定,但是在这里感觉像是潜在的问题.我使用cv :: calibrationMatrixValues来检查fovx是否计算出了本征矩阵,结果约为50°,而它应该接近60°.

I am not sure but it feels like a potential problem here. I've used cv::calibrationMatrixValues to check fovx for the calculated intrinsic matrix and the result was ~50°, while it should be close to 60°.

3.相机姿态矩阵

func findCameraPose(homography h: matrix_float3x3, size: CGSize) -> matrix_float4x3? {
    guard let intrinsic = intrinsicMatrix(imageSize: size),
        let intrinsicInverse = intrinsic.inverse else { return nil }

    let l1 = 1.0 / (intrinsicInverse * h.columns.0).norm
    let l2 = 1.0 / (intrinsicInverse * h.columns.1).norm
    let l3 = (l1+l2)/2

    let r1 = l1 * (intrinsicInverse * h.columns.0)
    let r2 = l2 * (intrinsicInverse * h.columns.1)
    let r3 = cross(r1, r2)

    let t = l3 * (intrinsicInverse * h.columns.2)

    return matrix_float4x3(columns: (r1, r2, r3, t))
}

结果:

由于我测量了该特定图像的大致位置和方向,所以我知道了变换矩阵,该变换矩阵可以提供预期的结果,并且完全不同:

Since I measured the approximate position and orientation for this particular image, I know the transform matrix, which would give the expected result and it is quite different:

我对参考旋转矩阵的2-3个元素也有所了解,它是-9.1,但由于旋转很小,应该改为接近零.

I am also a bit conserned about 2-3 element of reference rotation matrix, which is -9.1, while it should be close to zero instead, since there is very slight rotation.

有一个

There is a solvePnP function in OpenCV for this kind of problems, so I tried to use it instead of reinventing the wheel.

Objective-C ++中的OpenCV:

OpenCV in Objective-C++:

typedef struct CameraPose {
    SCNVector4 rotationVector;
    SCNVector3 translationVector; 
} CameraPose;

+ (CameraPose)findCameraPose: (NSArray<NSValue *> *) objectPoints imagePoints: (NSArray<NSValue *> *) imagePoints size: (CGSize) size {

    vector<Point3f> cvObjectPoints = [self convertObjectPoints:objectPoints];
    vector<Point2f> cvImagePoints = [self convertImagePoints:imagePoints withSize: size];

    cv::Mat distCoeffs(4,1,cv::DataType<double>::type, 0.0);
    cv::Mat rvec(3,1,cv::DataType<double>::type);
    cv::Mat tvec(3,1,cv::DataType<double>::type);
    cv::Mat cameraMatrix = [self intrinsicMatrixWithImageSize: size];

    cv::solvePnP(cvObjectPoints, cvImagePoints, cameraMatrix, distCoeffs, rvec, tvec);

    SCNVector4 rotationVector = SCNVector4Make(rvec.at<double>(0), rvec.at<double>(1), rvec.at<double>(2), norm(rvec));
    SCNVector3 translationVector = SCNVector3Make(tvec.at<double>(0), tvec.at<double>(1), tvec.at<double>(2));
    CameraPose result = CameraPose{rotationVector, translationVector};

    return result;
}

+ (vector<Point2f>) convertImagePoints: (NSArray<NSValue *> *) array withSize: (CGSize) size {
    vector<Point2f> points;
    for (NSValue * value in array) {
        CGPoint point = [value CGPointValue];
        points.push_back(Point2f(point.x - size.width/2, point.y - size.height/2));
    }
    return points;
}

+ (vector<Point3f>) convertObjectPoints: (NSArray<NSValue *> *) array {
    vector<Point3f> points;
    for (NSValue * value in array) {
        CGPoint point = [value CGPointValue];
        points.push_back(Point3f(point.x, 0.0, -point.y));
    }
    return points;
}

+ (cv::Mat) intrinsicMatrixWithImageSize: (CGSize) imageSize {
    double f = 0.84 * max(imageSize.width, imageSize.height);
    Mat result(3,3,cv::DataType<double>::type);
    cv::setIdentity(result);
    result.at<double>(0) = f;
    result.at<double>(4) = f;
    return result;
}

在Swift中的用法:

Usage in Swift:

func testSolvePnP() {
    let source = modelPoints().map { NSValue(cgPoint: $0) }
    let destination = perspectivePicker.currentPerspective.map { NSValue(cgPoint: $0)}

    let cameraPose = CameraPoseDetector.findCameraPose(source, imagePoints: destination, size: backgroundImageView.size);    
    cameraNode.rotation = cameraPose.rotationVector
    cameraNode.position = cameraPose.translationVector
}

输出:

结果更好,但远非我的预期.

The result is better but far from my expectations.

我也尝试过的其他操作:

  1. 这个问题是非常相似,尽管我不了解没有内在函数的情况下可接受的答案是如何工作的.
  2. decomposeHomographyMat 也没有给出我预期的结果
  1. This question is very similar, though I don't understand how the accepted answer is working without intrinsics.
  2. decomposeHomographyMat also didn't give me the result I expected

我真的对此问题感到困惑,因此我们将不胜感激.

推荐答案

实际上,我与使用 OpenCV 的工作解决方案只有一步之遥.

Actually I was one step away from the working solution with OpenCV.

第二种方法的问题是我忘记了将solvePnP的输出转换回SpriteKit的坐标系.

My problem with second approach was that I forgot to convert the output from solvePnP back to SpriteKit's coordinate system.

请注意,输入(图像和世界点)实际上已正确转换为OpenCV坐标系(convertObjectPoints:convertImagePoints:withSize:方法)

Note that the input (image and world points) was actually converted correctly to OpenCV coordinate system (convertObjectPoints: and convertImagePoints:withSize: methods)

这是一个固定的findCameraPose方法,其中打印了一些注释和中间结果:

So here is a fixed findCameraPose method with some comments and intermediate results printed:

+ (CameraPose)findCameraPose: (NSArray<NSValue *> *) objectPoints imagePoints: (NSArray<NSValue *> *) imagePoints size: (CGSize) size {

    vector<Point3f> cvObjectPoints = [self convertObjectPoints:objectPoints];
    vector<Point2f> cvImagePoints = [self convertImagePoints:imagePoints withSize: size];

    std::cout << "object points: " << cvObjectPoints << std::endl;
    std::cout << "image points: " << cvImagePoints << std::endl;

    cv::Mat distCoeffs(4,1,cv::DataType<double>::type, 0.0);
    cv::Mat rvec(3,1,cv::DataType<double>::type);
    cv::Mat tvec(3,1,cv::DataType<double>::type);
    cv::Mat cameraMatrix = [self intrinsicMatrixWithImageSize: size];

    cv::solvePnP(cvObjectPoints, cvImagePoints, cameraMatrix, distCoeffs, rvec, tvec);

    std::cout << "rvec: " << rvec << std::endl;
    std::cout << "tvec: " << tvec << std::endl;

    std::vector<cv::Point2f> projectedPoints;
    cvObjectPoints.push_back(Point3f(0.0, 0.0, 0.0));
    cv::projectPoints(cvObjectPoints, rvec, tvec, cameraMatrix, distCoeffs, projectedPoints);

    for(unsigned int i = 0; i < projectedPoints.size(); ++i) {
        std::cout << "Image point: " << cvImagePoints[i] << " Projected to " << projectedPoints[i] << std::endl;
    }


    cv::Mat RotX(3, 3, cv::DataType<double>::type);
    cv::setIdentity(RotX);
    RotX.at<double>(4) = -1; //cos(180) = -1
    RotX.at<double>(8) = -1;

    cv::Mat R;
    cv::Rodrigues(rvec, R);

    R = R.t();  // rotation of inverse
    Mat rvecConverted;
    Rodrigues(R, rvecConverted); //
    std::cout << "rvec in world coords:\n" << rvecConverted << std::endl;
    rvecConverted = RotX * rvecConverted;
    std::cout << "rvec scenekit :\n" << rvecConverted << std::endl;

    Mat tvecConverted = -R * tvec;
    std::cout << "tvec in world coords:\n" << tvecConverted << std::endl;
    tvecConverted = RotX * tvecConverted;
    std::cout << "tvec scenekit :\n" << tvecConverted << std::endl;

    SCNVector4 rotationVector = SCNVector4Make(rvecConverted.at<double>(0), rvecConverted.at<double>(1), rvecConverted.at<double>(2), norm(rvecConverted));
    SCNVector3 translationVector = SCNVector3Make(tvecConverted.at<double>(0), tvecConverted.at<double>(1), tvecConverted.at<double>(2));

    return CameraPose{rotationVector, translationVector};
}

注意:

  1. RotX矩阵表示旋转绕x轴旋转180度,将任何矢量从OpenCV坐标系转换到SpriteKit的

  1. RotX matrix means rotation by 180 degrees around x axis, which will transform any vector from OpenCV coordinate system to SpriteKit's

Rodrigues method transforms rotation vector to rotation matrix (3x3) and vice versa

这篇关于通过单应性或带有solvePnP()函数的相机姿态估计的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆