从单应中提取变换和旋转矩阵? [英] Extract transform and rotation matrices from homography?

查看:124
本文介绍了从单应中提取变换和旋转矩阵?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两张来自摄像机的连续图像,我想估计摄像机姿势的变化:

I have 2 consecutive images from a camera and I want to estimate the change in camera pose:

我计算光流量:

Const MAXFEATURES As Integer = 100
imgA = New Image(Of [Structure].Bgr, Byte)("pic1.bmp")
imgB = New Image(Of [Structure].Bgr, Byte)("pic2.bmp")
grayA = imgA.Convert(Of Gray, Byte)()
grayB = imgB.Convert(Of Gray, Byte)()
imagesize = cvGetSize(grayA)
pyrBufferA = New Emgu.CV.Image(Of Emgu.CV.Structure.Gray, Byte) _
    (imagesize.Width + 8, imagesize.Height / 3)
pyrBufferB = New Emgu.CV.Image(Of Emgu.CV.Structure.Gray, Byte) _
    (imagesize.Width + 8, imagesize.Height / 3)
features = MAXFEATURES
featuresA = grayA.GoodFeaturesToTrack(features, 0.01, 25, 3)
grayA.FindCornerSubPix(featuresA, New System.Drawing.Size(10, 10),
                       New System.Drawing.Size(-1, -1),
                       New Emgu.CV.Structure.MCvTermCriteria(20, 0.03))
features = featuresA(0).Length
Emgu.CV.OpticalFlow.PyrLK(grayA, grayB, pyrBufferA, pyrBufferB, _
                          featuresA(0), New Size(25, 25), 3, _
                          New Emgu.CV.Structure.MCvTermCriteria(20, 0.03D),
                          flags, featuresB(0), status, errors)
pointsA = New Matrix(Of Single)(features, 2)
pointsB = New Matrix(Of Single)(features, 2)
For i As Integer = 0 To features - 1
    pointsA(i, 0) = featuresA(0)(i).X
    pointsA(i, 1) = featuresA(0)(i).Y
    pointsB(i, 0) = featuresB(0)(i).X
    pointsB(i, 1) = featuresB(0)(i).Y
Next
Dim Homography As New Matrix(Of Double)(3, 3)
cvFindHomography(pointsA.Ptr, pointsB.Ptr, Homography, HOMOGRAPHY_METHOD.RANSAC, 1, 0)

它看起来正确,相机向左和向上移动: 现在,我想找出相机移动和旋转了多少.如果我声明了我的相机位置以及它在看什么,则:

and it looks right, the camera moved leftwards and upwards: Now I want to find out how much the camera moved and rotated. If I declare my camera position and what it's looking at:

' Create camera location at origin and lookat (straight ahead, 1 in the Z axis)
Location = New Matrix(Of Double)(2, 3)
location(0, 0) = 0 ' X location
location(0, 1) = 0 ' Y location
location(0, 2) = 0 ' Z location
location(1, 0) = 0 ' X lookat
location(1, 1) = 0 ' Y lookat
location(1, 2) = 1 ' Z lookat

如何计算新位置并进行查找?

How do I calculate the new position and lookat?

如果我做错了所有的事情,或者有更好的方法,任何建议都将受到欢迎,谢谢!

If I'm doing this all wrong or if there's a better method, any suggestions would be very welcome, thanks!

推荐答案

简单来说,您所看的是

Well what your looking at is in simple terms a Pythagorean theorem problem a^2 + b^2 = c^2. However when it comes to camera based applications things are not very easy to accurately determine. You have found half of the detail you need for "a" however finding "b" or "c" is much harder.

简短答案

基本上,单个摄像机无法完成.但这可以用两个摄像头完成.

Basically it can't be done with a single camera. But it can be with done with two cameras.

漫长的答案(我想更深入地解释一下,没有双关语)

The Long Winded Answer (Thought I'd explain in more depth, no pun intended)

我将尝试说明,例如,我们选择图像中的两个点并将摄像机向左移动.我们知道到相机的每个点B1的距离为20mm,点B2的距离为40mm.现在假设我们处理图像并且我们的测量值是A1为(0,2)和A2为(0,4),它们分别与B1和B2有关.现在A1和A2不是测量值;它们是运动的像素.

I'll try and explain, say we select two points within our image and move the camera left. We know the distance from the camera of each point B1 is 20mm and point B2 is 40mm . Now lets assume that we process the image and our measurement are A1 is (0,2) and A2 is (0,4) these are related to B1 and B2 respectively. Now A1 and A2 are not measurements; they are pixels of movement.

我们现在要做的是将A1和A2的变化乘以计算得出的常数,该常数将是B1和B2处的实际距离.注意:根据测量B *,这些参数各不相同.所有这些都与视角或更普遍地称为不同距离摄影中的视野".如果您知道相机CCD上每个像素的大小以及相机内部镜头的f数,则可以准确地计算常数.

What we now have to do is multiply the change in A1 and A2 by a calculated constant which will be the real world distance at B1 and B2. NOTE: Each one these is different according to measurement B*. This all relates to Angle of view or more commonly called the Field of View in photography at different distances. You can accurately calculate the constant if you know the size of each pixel on the camera CCD and the f number of the lens you have inside the camera.

我希望不是这种情况,因此在不同的距离上,您必须放置一个您知道长度的对象并查看它占用了多少像素.关闭后,您可以使用尺子使事情变得更轻松.通过这些测量.您可以获取这些数据,并形成一条最适合的曲线.其中X轴是物体的距离,Y轴是像素与距离之比的常数,必须乘以移动量.

I would expect this isn't the case so at different distances you have to place an object of which you know the length and see how many pixels it takes up. Close up you can use a ruler to make things easier. With these measurements. You take this data and form a curve with a line of best fit. Where the X-axis will be the distance of the object and the Y-axis will be the constant of pixel to distance ratio that you must multiply your movement by.

那么我们如何应用此曲线.好吧,这是猜测.理论上,运动A *的测量值越大,则物体越靠近摄像机.在我们的示例中,A1> A2的比率分别为5mm和3mm,现在我们知道B1点已经移动了10mm(2x5mm),B2点已经移动了6mm(2x6mm).但是,让我们面对现实吧-我们永远不会知道B,也永远无法判断移动的距离是近距离移动不远的物体还是远距离移动远的物体的20像素.这就是为什么类似Xbox Kinect之类的东西会使用其他传感器来获取深度信息的原因,该深度信息可以绑定到图像中的对象.

So how do we apply this curve. Well it's guess work. In theory the larger the measurement of movement A* the closer the object to the camera. In our example our ratios for A1 > A2 say 5mm and 3mm respectively and we would now know that point B1 has moved 10mm (2x5mm) and B2 has moved 6mm (2x6mm). But let's face it - we will never know B and we will never be able to tell if a distance moved is 20 pixels of an object close up not moving far or an object far away moving a much great distance. This is why things like the Xbox Kinect use additional sensors to get depth information that can be tied to the objects within the image.

您可以尝试使用两台摄像机进行尝试,因为已知这两个摄像机之间的距离可以更精确地计算出运动(有效地使用了深度传感器).这背后的数学极其复杂,我建议您查找有关该主题的一些期刊论文.如果您想让我解释一下理论,我可以尝试.

What you attempting could be attempted with two cameras as the distance between these cameras is known the movement can be more accurately calculated (effectively without using a depth sensor). The maths behind this is extremely complex and I would suggest looking up some journal papers on the subject. If you would like me to explain the theory, I can attempt to.

我所有的经验都来自为我的PHD设计高速视频采集和图像处理,所以请相信我,用一台摄像机无法做到.我希望其中的一些帮助.

All my experience comes from designing high speed video acquisition and image processing for my PHD so trust me, it can't be done with one camera, sorry. I hope some of this helps.

欢呼

克里斯

我本来要添加评论,但是由于信息量大,这很容易:

I was going to add a comment but this is easier due to the bulk of information:

既然是Kinect,我将假定您具有与每个点相关的一些相关深度信息,否则您将需要弄清楚如何获得此信息.

Since it is the Kinect I will assume you have some relevant depth information associated with each point if not you will need to figure out how to get this.

您需要以方程式开头的是视场( FOV ):

The equation you will need to start of with is for the Field of View (FOV):

o/d = i/f

位置:

f 等于通常以毫米为单位的镜头焦距(即18 28 30 50是标准示例)

f is equal to the focal length of the lens usually given in mm (i.e. 18 28 30 50 are standard examples)

d 是从kinect数据收集的距镜头的物距

d is the object distance from the lens gathered from kinect data

o 是物体尺寸(或垂直于光轴并被光轴平分的视场").

o is the object dimension (or "field of view" perpendicular to and bisected by the optical axis).

i 是图像尺寸(或垂直于光轴并被光轴平分的视场光阑").

i is the image dimension (or "field stop" perpendicular to and bisected by the optical axis).

我们需要计算 i ,其中 o 是我们未知的,因此对于 i (对角线尺寸)

We need to calculate i, where o is our unknown so for i (which is a diagonal measurement),

我们需要CCD上的像素大小(以微米或µm为单位),您需要找到此信息.要知道,我们将其取为14um,这是中距离区域扫描相机的标准配置.

We will need the size of the pixel on the ccd this will in micrometres or µm you will need to find this information out, For know we will take it as being 14um which is standard for a midrange area scan camera.

因此,首先我们需要计算出i的水平尺寸( ih ),该尺寸是相机宽度的像素数乘以CCD像素的尺寸(我们将使用640 x 320 )

So first we need to work out i horizontal dimension (ih) which is the number of pixels of the width of the camera multiplied by the size of the ccd pixel (We will use 640 x 320)

如此: ih = 640 * 14um = 8960um

so: ih = 640*14um = 8960um

   = 8960/1000 = 8.96mm

现在我们需要 i 垂直尺寸( iv ),但高度相同

Now we need i vertical dimension (iv) same process but height

如此: iv =(320 * 14um)/1000 = 4.48mm

so: iv = (320 * 14um) / 1000 = 4.48mm

现在 i 由勾股定理勾股定理a ^ 2 + b ^ 2 = c ^ 2

Now i is found by Pythagorean theorem Pythagorean theorem a^2 + b^2 = c^2

如此: i = sqrt(ih ^ 2 _ iv ^ 2)

so: i = sqrt(ih^2 _ iv^2)

  = 10.02 mm

现在,我们假设我们有一个28毫米镜头.同样,必须找出该确切值.因此我们重新计算了等式,使我们的 o 为:

Now we will assume we have a 28 mm lens. Again, this exact value will have to be found out. So our equation is rearranged to give us o is:

o =(i * d)/f

记住 o 将是对角线(我们假设物体或点在50毫米外):

Remember o will be diagonal (we will assume of object or point is 50mm away):

o =(10.02mm * 50mm)/28mm

o = (10.02mm * 50mm) / 28mm

17.89mm

现在,我们需要计算o水平尺寸( oh )和o垂直尺寸( ov ),因为这将为我们提供对象移动的每个像素的距离.现在,由于 FOVαCCD i o 成正比,我们将得出比率 k

Now we need to work out o horizontal dimension (oh) and o vertical dimension (ov) as this will give us the distance per pixel that the object has moved. Now as FOV α CCD or i is directly proportional to o we will work out a ratio k

k =输入/输出

= 10.02 / 17.89 

= 0.56

如此:

o 水平尺寸():

= ih/k

= 8.96mm/0.56 = 每个像素16mm

= 8.96mm / 0.56 = 16mm per pixel

o 垂直尺寸( ov ):

ov = iv/k

= 4.48mm/0.56 = 每个像素8mm

= 4.48mm / 0.56 = 8mm per pixel

现在我们有了所需的常量,让我们在示例中使用它.如果我们在50mm处的物体从位置(0,0)移至(2,4),那么现实中的测量值为:

Now we have the constants we require, let's use it in an example. If our object at 50mm moves from position (0,0) to (2,4) then the measurements in real life are:

(2 * 16mm,4 * 8mm)=(32mm,32mm)

(2*16mm , 4*8mm) = (32mm,32mm)

同样,勾股定理:a ^ 2 + b ^ 2 = c ^ 2

Again, a Pythagorean theorem: a^2 + b^2 = c^2

总距离= sqrt(32 ^ 2 + 32 ^ 2)

Total distance = sqrt(32^2 + 32^2)

           = 45.25mm

我知道很复杂,但是一旦将其包含在程序中,它将变得更加容易.因此,对于每个点,您都必须至少重复一半的过程,因为 d 会随之更改,因此对于您检查的每个点,都会 o .

Complicated I know, but once you have this in a program it's easier. So for every point you will have to repeat at least half the process as d will change on therefore o for every point your examining.

希望这可以帮助您

欢呼 克里斯

这篇关于从单应中提取变换和旋转矩阵?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆