从单应性或使用 solvePnP() 函数估计相机姿态 [英] Camera pose estimation from homography or with solvePnP() function

查看:47
本文介绍了从单应性或使用 solvePnP() 函数估计相机姿态的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在一张照片上构建静态增强现实场景,其中平面和图像上的共面点之间有 4 个定义的对应关系.

以下是分步流程:

  1. 用户使用设备的相机添加图像.让我们假设它包含一个从某个角度捕获的矩形.
  2. 用户定义矩形的物理尺寸,它位于水平面(就 SceneKit 而言是 YOZ).让我们假设它的中心是世界的原点 (0, 0, 0),所以我们可以很容易地找到每个角的 (x,y,z).
  3. 用户在图像坐标系中为矩形的每个角定义 uv 坐标.
  4. SceneKit 场景是使用相同大小的矩形创建的,并且在相同的视角下可见.
  5. 可以在场景中添加和移动其他节点.

我还测量了 iphone 相机相对于 A4 纸中心的位置.所以对于这个镜头,位置是以厘米为单位的 (0, 14, 42.5).我的 iPhone 也稍微倾斜到桌子上(5-10 度)

使用这些数据,我设置了 SCNCamera 以获得第三张图像上蓝色平面的所需视角:

let camera = SCNCamera()相机.xFov = 66相机.zFar = 1000相机.zNear = 0.01cameraNode.camera = 相机相机角度 = -7 * CGFloat.pi/180cameraNode.rotation = SCNVector4(x: 1, y: 0, z: 0, w: Float(cameraAngle))cameraNode.position = SCNVector3(x: 0, y: 14, z: 42.5)

这会给我一个参考来比较我的结果.

为了使用 SceneKit 构建 AR,我需要:

  1. 调整 SCNCamera 的 fov,使其与真实相机的 fov 相匹配.
  2. 使用世界点 (x,0,z) 和图像点 (u, v) 之间的 4 个对应关系计算相机节点的位置和旋转

H - 单应性;K - 内在矩阵;[R |t] - 外在矩阵

为了找到相机的变换矩阵,我尝试了两种方法:使用 OpenCV 中的 solvePnP 和基于 4 个共面点的单应性手动计算.

手动方法:

1.找出单应性

这一步成功了,因为世界原点的UV坐标似乎是正确的.

2.内在矩阵

为了获得 iPhone 6 的内在矩阵,我使用了

假设输入图像的纵横比为 4:3,我可以根据分辨率缩放上述矩阵

我不确定,但这感觉像是一个潜在的问题.我已经使用 cv::calibrationMatrixValues 检查 fovx 计算出的内在矩阵,结果是 ~50°,而它应该接近 60°.

3.相机姿态矩阵

func findCameraPose(homography h: matrix_float3x3, size: CGSize) ->matrix_float4x3?{守卫让内在 = 内在矩阵(图像大小:大小),让内在逆 = 内在.inverse else { 返回 nil }让 l1 = 1.0/(intrinsicInverse * h.columns.0).norm让 l2 = 1.0/(intrinsicInverse * h.columns.1).norm让 l3 = (l1+l2)/2让 r1 = l1 * (intrinsicInverse * h.columns.0)让 r2 = l2 * (intrinsicInverse * h.columns.1)让 r3 = 交叉(r1,r2)让 t = l3 * (intrinsicInverse * h.columns.2)返回 matrix_float4x3(列:(r1,r2,r3,t))}

结果:

因为我测量了这个特定图像的大致位置和方向,所以我知道变换矩阵,它会给出预期的结果,但结果却大不相同:

我也有点担心参考旋转矩阵的 2-3 个元素,即 -9.1,而它应该接近于零,因为旋转非常轻微.

OpenCV 方法:

有一个

结果更好,但与我的预期相差甚远.

我还尝试过其他一些事情:

  1. 请注意,输入(图像和世界点)实际上已正确转换为 OpenCV 坐标系(convertObjectPoints:convertImagePoints:withSize: 方法)

    所以这是一个固定的 findCameraPose 方法,带有一些注释和打印的中间结果:

    + (CameraPose)findCameraPose: (NSArray *) objectPoints imagePoints: (NSArray *) imagePoints size: (CGSize) size {矢量<Point3f>cvObjectPoints = [self convertObjectPoints:objectPoints];矢量<Point2f>cvImagePoints = [self convertImagePoints:imagePoints withSize: size];std::cout <<对象点:"<<cvObjectPoints<::type, 0.0);cv::Mat rvec(3,1,cv::DataType::type);cv::Mat tvec(3,1,cv::DataType::type);cv::Mat cameraMatrix = [self internalMatrixWithImageSize: size];cv::solvePnP(cvObjectPoints, cvImagePoints, cameraMatrix, distCoeffs, rvec, tvec);std::cout <<rvec:"<<rvec<投影点数;cvObjectPoints.push_back(Point3f(0.0, 0.0, 0.0));cv::projectPoints(cvObjectPoints, rvec, tvec, cameraMatrix, distCoeffs,projectedPoints);for(unsigned int i = 0; i ::type);cv::setIdentity(RotX);RotX.at<double>(4)=-1;//cos(180) = -1RotX.at<double>(8)=-1;简历::垫R;简历::罗德里格斯(rvec,R);R = R.t();//反向旋转Mat rvec已转换;罗德里格斯(R, rvecConverted);//std::cout <<世界坐标中的 rvec:
    " <<rvecConverted<(0), rvecConverted.at(1), rvecConverted.at(2), norm(rvecConverted));SCNVector3 translationVector = SCNVector3Make(tvecConverted.at(0), tvecConverted.at(1), tvecConverted.at(2));返回 CameraPose{rotationVector, translationVector};}

    注意事项:

    1. RotX 矩阵表示 旋转 180 度左右x 轴,将任何向量从 OpenCV 坐标系转换为 SpriteKit 的

    2. Rodrigues 方法将旋转向量转换为旋转矩阵(3x3),反之亦然

    I'm trying to build static augmented reality scene over a photo with 4 defined correspondences between coplanar points on a plane and image.

    Here is a step by step flow:

    1. User adds an image using device's camera. Let's assume it contains a rectangle captured with some perspective.
    2. User defines physical size of the rectangle, which lies in horizontal plane (YOZ in terms of SceneKit). Let's assume it's center is world's origin (0, 0, 0), so we can easily find (x,y,z) for each corner.
    3. User defines uv coordinates in image coordinate system for each corner of the rectangle.
    4. SceneKit scene is created with a rectangle of the same size, and visible at the same perspective.
    5. Other nodes can be added and moved in the scene.

    I've also measured position of the iphone camera relatively to the center of the A4 paper. So for this shot the position was (0, 14, 42.5) measured in cm. Also my iPhone was slightly pitched to the table (5-10 degrees)

    Using this data I've set up SCNCamera to get the desired perspective of the blue plane on the third image:

    let camera = SCNCamera()
    camera.xFov = 66
    camera.zFar = 1000
    camera.zNear = 0.01
    
    cameraNode.camera = camera
    cameraAngle = -7 * CGFloat.pi / 180
    cameraNode.rotation = SCNVector4(x: 1, y: 0, z: 0, w: Float(cameraAngle))
    cameraNode.position = SCNVector3(x: 0, y: 14, z: 42.5)
    

    This will give me a reference to compare my result with.

    In order to build AR with SceneKit I need to:

    1. Adjust SCNCamera's fov, so that it matches real camera's fov.
    2. Calculate position and rotation for camera node using 4 correnspondensies between world points (x,0,z) and image points (u, v)

    H - homography; K - Intrinsic matrix; [R | t] - Extrinsic matrix

    I tried two approaches in order to find transform matrix for camera: using solvePnP from OpenCV and manual calculation from homography based on 4 coplanar points.

    Manual approach:

    1. Find out homography

    This step is done successfully, since UV coordinates of world's origin seems to be correct.

    2. Intrinsic matrix

    In order to get intrinsic matrix of iPhone 6, I have used this app, which gave me the following result out of 100 images of 640*480 resolution:

    Assuming that input image has 4:3 aspect ratio, I can scale the above matrix depending on resolution

    I am not sure but it feels like a potential problem here. I've used cv::calibrationMatrixValues to check fovx for the calculated intrinsic matrix and the result was ~50°, while it should be close to 60°.

    3. Camera pose matrix

    func findCameraPose(homography h: matrix_float3x3, size: CGSize) -> matrix_float4x3? {
        guard let intrinsic = intrinsicMatrix(imageSize: size),
            let intrinsicInverse = intrinsic.inverse else { return nil }
    
        let l1 = 1.0 / (intrinsicInverse * h.columns.0).norm
        let l2 = 1.0 / (intrinsicInverse * h.columns.1).norm
        let l3 = (l1+l2)/2
    
        let r1 = l1 * (intrinsicInverse * h.columns.0)
        let r2 = l2 * (intrinsicInverse * h.columns.1)
        let r3 = cross(r1, r2)
    
        let t = l3 * (intrinsicInverse * h.columns.2)
    
        return matrix_float4x3(columns: (r1, r2, r3, t))
    }
    

    Result:

    Since I measured the approximate position and orientation for this particular image, I know the transform matrix, which would give the expected result and it is quite different:

    I am also a bit conserned about 2-3 element of reference rotation matrix, which is -9.1, while it should be close to zero instead, since there is very slight rotation.

    OpenCV approach:

    There is a solvePnP function in OpenCV for this kind of problems, so I tried to use it instead of reinventing the wheel.

    OpenCV in Objective-C++:

    typedef struct CameraPose {
        SCNVector4 rotationVector;
        SCNVector3 translationVector; 
    } CameraPose;
    
    + (CameraPose)findCameraPose: (NSArray<NSValue *> *) objectPoints imagePoints: (NSArray<NSValue *> *) imagePoints size: (CGSize) size {
    
        vector<Point3f> cvObjectPoints = [self convertObjectPoints:objectPoints];
        vector<Point2f> cvImagePoints = [self convertImagePoints:imagePoints withSize: size];
    
        cv::Mat distCoeffs(4,1,cv::DataType<double>::type, 0.0);
        cv::Mat rvec(3,1,cv::DataType<double>::type);
        cv::Mat tvec(3,1,cv::DataType<double>::type);
        cv::Mat cameraMatrix = [self intrinsicMatrixWithImageSize: size];
    
        cv::solvePnP(cvObjectPoints, cvImagePoints, cameraMatrix, distCoeffs, rvec, tvec);
    
        SCNVector4 rotationVector = SCNVector4Make(rvec.at<double>(0), rvec.at<double>(1), rvec.at<double>(2), norm(rvec));
        SCNVector3 translationVector = SCNVector3Make(tvec.at<double>(0), tvec.at<double>(1), tvec.at<double>(2));
        CameraPose result = CameraPose{rotationVector, translationVector};
    
        return result;
    }
    
    + (vector<Point2f>) convertImagePoints: (NSArray<NSValue *> *) array withSize: (CGSize) size {
        vector<Point2f> points;
        for (NSValue * value in array) {
            CGPoint point = [value CGPointValue];
            points.push_back(Point2f(point.x - size.width/2, point.y - size.height/2));
        }
        return points;
    }
    
    + (vector<Point3f>) convertObjectPoints: (NSArray<NSValue *> *) array {
        vector<Point3f> points;
        for (NSValue * value in array) {
            CGPoint point = [value CGPointValue];
            points.push_back(Point3f(point.x, 0.0, -point.y));
        }
        return points;
    }
    
    + (cv::Mat) intrinsicMatrixWithImageSize: (CGSize) imageSize {
        double f = 0.84 * max(imageSize.width, imageSize.height);
        Mat result(3,3,cv::DataType<double>::type);
        cv::setIdentity(result);
        result.at<double>(0) = f;
        result.at<double>(4) = f;
        return result;
    }
    

    Usage in Swift:

    func testSolvePnP() {
        let source = modelPoints().map { NSValue(cgPoint: $0) }
        let destination = perspectivePicker.currentPerspective.map { NSValue(cgPoint: $0)}
    
        let cameraPose = CameraPoseDetector.findCameraPose(source, imagePoints: destination, size: backgroundImageView.size);    
        cameraNode.rotation = cameraPose.rotationVector
        cameraNode.position = cameraPose.translationVector
    }
    

    Output:

    The result is better but far from my expectations.

    Some other things I've also tried:

    1. This question is very similar, though I don't understand how the accepted answer is working without intrinsics.
    2. decomposeHomographyMat also didn't give me the result I expected

    I am really stuck with this issue so any help would be much appreciated.

    解决方案

    Actually I was one step away from the working solution with OpenCV.

    My problem with second approach was that I forgot to convert the output from solvePnP back to SpriteKit's coordinate system.

    Note that the input (image and world points) was actually converted correctly to OpenCV coordinate system (convertObjectPoints: and convertImagePoints:withSize: methods)

    So here is a fixed findCameraPose method with some comments and intermediate results printed:

    + (CameraPose)findCameraPose: (NSArray<NSValue *> *) objectPoints imagePoints: (NSArray<NSValue *> *) imagePoints size: (CGSize) size {
    
        vector<Point3f> cvObjectPoints = [self convertObjectPoints:objectPoints];
        vector<Point2f> cvImagePoints = [self convertImagePoints:imagePoints withSize: size];
    
        std::cout << "object points: " << cvObjectPoints << std::endl;
        std::cout << "image points: " << cvImagePoints << std::endl;
    
        cv::Mat distCoeffs(4,1,cv::DataType<double>::type, 0.0);
        cv::Mat rvec(3,1,cv::DataType<double>::type);
        cv::Mat tvec(3,1,cv::DataType<double>::type);
        cv::Mat cameraMatrix = [self intrinsicMatrixWithImageSize: size];
    
        cv::solvePnP(cvObjectPoints, cvImagePoints, cameraMatrix, distCoeffs, rvec, tvec);
    
        std::cout << "rvec: " << rvec << std::endl;
        std::cout << "tvec: " << tvec << std::endl;
    
        std::vector<cv::Point2f> projectedPoints;
        cvObjectPoints.push_back(Point3f(0.0, 0.0, 0.0));
        cv::projectPoints(cvObjectPoints, rvec, tvec, cameraMatrix, distCoeffs, projectedPoints);
    
        for(unsigned int i = 0; i < projectedPoints.size(); ++i) {
            std::cout << "Image point: " << cvImagePoints[i] << " Projected to " << projectedPoints[i] << std::endl;
        }
    
    
        cv::Mat RotX(3, 3, cv::DataType<double>::type);
        cv::setIdentity(RotX);
        RotX.at<double>(4) = -1; //cos(180) = -1
        RotX.at<double>(8) = -1;
    
        cv::Mat R;
        cv::Rodrigues(rvec, R);
    
        R = R.t();  // rotation of inverse
        Mat rvecConverted;
        Rodrigues(R, rvecConverted); //
        std::cout << "rvec in world coords:
    " << rvecConverted << std::endl;
        rvecConverted = RotX * rvecConverted;
        std::cout << "rvec scenekit :
    " << rvecConverted << std::endl;
    
        Mat tvecConverted = -R * tvec;
        std::cout << "tvec in world coords:
    " << tvecConverted << std::endl;
        tvecConverted = RotX * tvecConverted;
        std::cout << "tvec scenekit :
    " << tvecConverted << std::endl;
    
        SCNVector4 rotationVector = SCNVector4Make(rvecConverted.at<double>(0), rvecConverted.at<double>(1), rvecConverted.at<double>(2), norm(rvecConverted));
        SCNVector3 translationVector = SCNVector3Make(tvecConverted.at<double>(0), tvecConverted.at<double>(1), tvecConverted.at<double>(2));
    
        return CameraPose{rotationVector, translationVector};
    }
    

    Notes:

    1. RotX matrix means rotation by 180 degrees around x axis, which will transform any vector from OpenCV coordinate system to SpriteKit's

    2. Rodrigues method transforms rotation vector to rotation matrix (3x3) and vice versa

    这篇关于从单应性或使用 solvePnP() 函数估计相机姿态的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆