从视频中重建 3D 轨迹(由单个摄像头拍摄) [英] 3D trajectory reconstruction from video (taken by a single camera)

查看:30
本文介绍了从视频中重建 3D 轨迹(由单个摄像头拍摄)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在尝试从 iPhone 视频中拍摄的一系列图像中重建落下物体(如球或岩石)的 3D 轨迹.

我应该从哪里开始寻找?我知道我必须校准相机(我想我会使用 Jean-Yves Bouguet 的 matlab 校准工具箱),然后从相同的序列中找到消失点,但后来我真的被卡住了.

解决方案

请阅读:

的深度

我看到最后一个问题:当以一定角度(即不垂直于捕获平面)投影时,球的形状会发生变化.但是您确实知道角度,因此可以进行补偿,但我将其留给您:p

@ripkars 的评论(评论框太小)

1) 好的

2) 啊哈,对应问题 :D 通常通过相关分析或匹配特征来解决(主要是匹配,然后在视频中进行跟踪).(其他方法也存在)我自己没有使用过图像/视觉工具箱,但在此过程中肯定会有一些东西可以帮助您.

3) = 校准您的相机.通常你应该只做一次,在安装相机时(以及每隔一次改变它们的相对姿势)

4) 是的,只要让 Longuet-Higgins 方程起作用,即:求解

P = C1 + mu1*R1*K1^(-1)*p1P = C2 + mu2*R2*K2^(-1)*p2

与P = 要查找的 3D 点C = 相机中心(矢量)R = 旋转矩阵,表示世界坐标系中第一个相机的方向.K = 相机的标定矩阵(包含相机的内部参数,不要与 R 和 C 包含的外部参数混淆)p1 和 p2 = 图像点mu = 参数表示 P 在投影线上从相机中心 C 到 P 的位置(如果我是正确的 R*K^-1*p 表示从 C 指向 P 的线方程/向量)

这些是包含 5 个未知数的 6 个方程:mu1、mu2 和 P

@ripkars 的评论(评论框再次太小)我脑海中唯一出现的计算机视觉库是 OpenCV (http://opencv.willowgarage.com/wiki).但那是一个 C 库,而不是 matlab ......我猜谷歌是你的朋友 ;)

关于校准:是的,如果这两个图像包含足够的信息来匹配某些特征.如果你改变相机的相对姿态,你当然必须重新校准.

世界框架的选择是任意的;只有当您想在之后分析检索到的 3d 数据时,它才变得重要:例如,您可以将世界平面之一与运动平面对齐 -> 如果您想拟合一个,则简化运动方程.这个世界坐标系只是一个参考坐标系,可以通过参考坐标系变换的变化"(平移和/或旋转变换)来改变

I am currently trying to reconstruct a 3D trajectory of a falling object like a ball or a rock out of a sequence of images taken from an iPhone video.

Where should I start looking? I know I have to calibrate the camera (I think I'll use the matlab calibration toolbox by Jean-Yves Bouguet) and then find the vanishing point from the same sequence, but then I'm really stuck.

解决方案

read this: http://www.cs.auckland.ac.nz/courses/compsci773s1c/lectures/773-GG/lectA-773.htm it explains 3d reconstruction using two cameras. Now for a simple summary, look at the figure from that site:

You only know pr/pl, the image points. By tracing a line from their respective focal points Or/Ol you get two lines (Pr/Pl) that both contain the point P. Because you know the 2 cameras origin and orientation, you can construct 3d equations for these lines. Their intersection is thus the 3d point, voila, it's that simple.

But when you discard one camera (let's say the left one), you only know for sure the line Pr. What's missing is depth. Luckily you know the radius of your ball, this extra information can give you the missing depth information. see next figure (don't mind my paint skills):

Now you know the depth using the intercept theorem

I see one last issue: the shape of ball changes when projected under an angle (ie not perpendicular on your capture plane). However you do know the angle, so compensation is possible, but I leave that up to you :p

edit: @ripkars' comment (comment box was too small)

1) ok

2) aha, the correspondence problem :D Typically solved by correlation analysis or matching features (mostly matching followed by tracking in a video). (other methods exist too) I haven't used the image/vision toolbox myself, but there should definitely be some things to help you on the way.

3) = calibration of your cameras. Normally you should only do this once, when installing the cameras (and every other time you change their relative pose)

4) yes, just put the Longuet-Higgins equation to work, ie: solve

P = C1 + mu1*R1*K1^(-1)*p1
P = C2 + mu2*R2*K2^(-1)*p2

with P = 3D point to find C = camera center (vector) R = rotation matrix expressing the orientation of the first camera in the world frame. K = calibration matrix of the camera (containing internal parameters of the camera, not to be confused with the external parameters contained by R and C) p1 and p2 = the image points mu = parameter expressing the position of P on the projection line from camera center C to P (if i'm correct R*K^-1*p expresses a line equation/vector pointing from C to P)

these are 6 equations containing 5 unknowns: mu1, mu2 and P

edit: @ripkars' comment (comment box too small once again) The only computer vison library that pops up in my mind is OpenCV (http://opencv.willowgarage.com/wiki ). But that's a C library, not matlab... I guess google is your friend ;)

About the calibration: yes, if those two images contain enough information to match some features. If you change the relative pose of the cameras, you'll have to recalibrate of course.

The choice of the world frame is arbitrary; it only becomes important when you want to analyze the retrieved 3d data afterwards: for example you could align one of the world planes with the plane of motion -> simplified motion equation if you want to fit one. This world frame is just a reference frame, changeable with a 'change of reference frame transformation' (translation and/or rotation transformation)

这篇关于从视频中重建 3D 轨迹(由单个摄像头拍摄)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆