从单应性中提取变换和旋转矩阵? [英] Extract transform and rotation matrices from homography?

查看:72
本文介绍了从单应性中提取变换和旋转矩阵?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有 2 张来自相机的连续图像,我想估计相机姿势的变化:

我计算光流:

Const MAXFEATURES As Integer = 100imgA = New Image(Of [Structure].Bgr, Byte)("pic1.bmp")imgB = New Image(Of [Structure].Bgr, Byte)("pic2.bmp")grayA = imgA.Convert(Of Gray, Byte)()grayB = imgB.Convert(Of Gray, Byte)()图像大小 = cvGetSize(grayA)pyrBufferA = New Emgu.CV.Image(Of Emgu.CV.Structure.Gray, Byte) _(imagesize.Width + 8, imagesize.Height/3)pyrBufferB = New Emgu.CV.Image(Of Emgu.CV.Structure.Gray, Byte) _(imagesize.Width + 8, imagesize.Height/3)功能 = MAXFEATURESfeaturesA = grayA.GoodFeaturesToTrack(features, 0.01, 25, 3)grayA.FindCornerSubPix(featuresA, New System.Drawing.Size(10, 10),新 System.Drawing.Size(-1, -1),新 Emgu.CV.Structure.MCvTermCriteria(20, 0.03))特征 = featuresA(0).长度Emgu.CV.OpticalFlow.PyrLK(grayA, grayB, pyrBufferA, pyrBufferB, _特征A(0), 新尺寸(25, 25), 3, _新 Emgu.CV.Structure.MCvTermCriteria(20, 0.03D),标志、功能 B(0)、状态、错误)点A =新矩阵(单)(特征,2)点B =新矩阵(单)(特征,2)For i As Integer = 0 To features - 1点A(i, 0) = featuresA(0)(i).X点A(i, 1) = featuresA(0)(i).Y点B(i, 0) = featuresB(0)(i).X点B(i, 1) = featuresB(0)(i).Y下一个Dim Homography As New Matrix(Of Double)(3, 3)cvFindHomography(pointsA.Ptr, pointsB.Ptr, Homography, HOMOGRAPHY_METHOD.RANSAC, 1, 0)

看起来是正确的,相机向左和向上移动:现在我想知道相机移动和旋转了多少.如果我声明我的相机位置以及它在看什么:

' 在原点创建相机位置并观察(直线向前,Z 轴为 1)位置 = 新矩阵(双倍)(2, 3)location(0, 0) = 0 ' X 位置location(0, 1) = 0 ' Y 位置location(0, 2) = 0 ' Z 位置location(1, 0) = 0 ' X 看location(1, 1) = 0 ' Y 看location(1, 2) = 1 ' Z 看

如何计算新的位置和外观?

如果我做错了或者有更好的方法,欢迎提供任何建议,谢谢!

解决方案

嗯,你所看到的简单来说就是 勾股定理问题 a^2 + b^2 = c^2.然而,当涉及到基于相机的应用程序时,要准确确定事情并不容易.您已经找到了a"所需的一半细节,但是找到b"或c"要困难得多.

简答

基本上不能用单个相机完成.但它可以用两个摄像头完成.

冗长的答案(我想我会更深入地解释,没有双关语)

我会试着解释一下,假设我们在图像中选择两个点并向左移动相机.我们知道每个点 B1 与相机的距离为 20mm,点 B2 为 40mm.现在让我们假设我们处理图像并且我们的测量结果是 A1 是 (0,2) 和 A2 是 (0,4) 这些分别与 B1 和 B2 相关.现在 A1 和 A2 不是测量值;它们是运动的像素.

我们现在要做的是将 A1 和 A2 的变化乘以计算得出的常数,即 B1 和 B2 处的真实世界距离.注意:根据测量值 B*,每一个都不同.这一切都与视角有关,或者在不同距离的摄影中通常称为视野.如果您知道相机 CCD 上每个像素的大小和相机内部镜头的 f 数,您就可以准确地计算出这个常数.

我认为情况并非如此,因此在不同的距离处,您必须放置一个已知长度的对象,并查看它占用了多少像素.关闭时,您可以使用标尺使事情变得更容易.通过这些测量.您获取这些数据并形成一条带有最佳拟合线的曲线.其中 X 轴是物体的距离,Y 轴是像素与距离比率的常数,您必须乘以您的运动.

那么我们如何应用这条曲线.嗯,这是猜测工作.理论上,运动 A* 的测量值越大,物体离相机越近.在我们的示例中,A1 > A2 的比率分别为 5 毫米和 3 毫米,我们现在知道点 B1 移动了 10 毫米(2x5 毫米),而 B2 移动了 6 毫米(2x6 毫米).但是让我们面对现实吧——我们永远不会知道 B 并且我们永远无法判断移动的距离是近距离物体的 20 个像素没有移动很远还是远处物体移动了很远的距离.这就是为什么像 Xbox Kinect 这样的东西使用额外的传感器来获取可以与图像中的对象相关联的深度信息.

您可以尝试使用两个摄像头进行尝试,因为已知这些摄像头之间的距离,可以更准确地计算运动(无需使用深度传感器即可有效).这背后的数学非常复杂,我建议您查阅一些有关该主题的期刊论文.如果你想让我解释这个理论,我可以尝试.

我所有的经验都来自于为我的 PHD 设计高速视频采集和图像处理,所以相信我,这不能用一台相机完成,抱歉.我希望其中有一些帮助.

干杯

克里斯

我想添加评论,但由于信息量大,这更容易:

因为它是 Kinect,所以我假设你有一些与每个点相关的相关深度信息,否则你需要弄清楚如何获得它.

您需要以视场 (FOV) 开始的等式:

o/d = i/f

地点:

f 等于镜头的焦距,通常以毫米为单位(即 18 28 30 50 是标准示例)

d 是从 kinect 数据收集到的镜头的物距

o 是物体尺寸(或垂直于光轴并被光轴平分的视场").

i 是图像尺寸(或垂直于光轴并被光轴平分的视场光阑").

我们需要计算i,其中o是我们的未知数,所以对于i(这是一个对角线测量),

我们将需要 ccd 上像素的大小(以微米或微米为单位),您需要找出这些信息,众所周知,我们将其视为 14 微米,这是中档区域扫描相机的标准.

所以首先我们需要计算出 i 水平尺寸 (ih),即相机宽度的像素数乘以 ccd 像素的大小(我们将使用 640 x 320)

所以:ih = 640*14um = 8960um

 = 8960/1000 = 8.96mm

现在我们需要i垂直维度(iv)相同的过程但高度

所以:iv = (320 * 14um)/1000 = 4.48mm

现在i被勾股定理找到勾股定理a^2 + b^2 = c^2

所以:i = sqrt(ih^2 _ iv^2)

 = 10.02 毫米

现在我们假设我们有一个 28 毫米的镜头.同样,必须找出这个确切的值.所以我们的方程被重新排列,给我们 o 是:

o = (i * d)/f

记住 o 将是对角线(我们假设物体或点距离 50 毫米):

o = (10.02mm * 50mm)/28mm

17.89mm

现在我们需要计算 o 水平维度 (oh) 和 o 垂直维度 (ov),因为这将为我们提供对象移动的每个像素的距离.现在 FOV α CCDio 成正比,我们将计算出一个比率 k

k = i/o

= 10.02/17.89= 0.56

所以:

o 水平尺寸():

= ih/k

= 8.96 毫米/0.56 = 每像素 16 毫米

o 垂直尺寸 (ov):

ov = iv/k

= 4.48 毫米/0.56 = 每像素 8 毫米

现在我们有了我们需要的常量,让我们在一个例子中使用它.如果我们 50mm 处的物体从位置 (0,0) 移动到 (2,4),那么现实生活中的测量值是:

(2*16mm , 4*8mm) = (32mm,32mm)

再次,一个勾股定理:a^2 + b^2 = c^2

总距离 = sqrt(32^2 + 32^2)

 = 45.25mm

我知道很复杂,但是一旦您在程序中使用它,它就更容易了.因此,对于每一点,您必须至少重复一半的过程,因为 d 会改变,因此对于您检查的每一点 o.

希望这能让你顺利,

干杯克里斯

I have 2 consecutive images from a camera and I want to estimate the change in camera pose:

I calculate the optical flow:

Const MAXFEATURES As Integer = 100
imgA = New Image(Of [Structure].Bgr, Byte)("pic1.bmp")
imgB = New Image(Of [Structure].Bgr, Byte)("pic2.bmp")
grayA = imgA.Convert(Of Gray, Byte)()
grayB = imgB.Convert(Of Gray, Byte)()
imagesize = cvGetSize(grayA)
pyrBufferA = New Emgu.CV.Image(Of Emgu.CV.Structure.Gray, Byte) _
    (imagesize.Width + 8, imagesize.Height / 3)
pyrBufferB = New Emgu.CV.Image(Of Emgu.CV.Structure.Gray, Byte) _
    (imagesize.Width + 8, imagesize.Height / 3)
features = MAXFEATURES
featuresA = grayA.GoodFeaturesToTrack(features, 0.01, 25, 3)
grayA.FindCornerSubPix(featuresA, New System.Drawing.Size(10, 10),
                       New System.Drawing.Size(-1, -1),
                       New Emgu.CV.Structure.MCvTermCriteria(20, 0.03))
features = featuresA(0).Length
Emgu.CV.OpticalFlow.PyrLK(grayA, grayB, pyrBufferA, pyrBufferB, _
                          featuresA(0), New Size(25, 25), 3, _
                          New Emgu.CV.Structure.MCvTermCriteria(20, 0.03D),
                          flags, featuresB(0), status, errors)
pointsA = New Matrix(Of Single)(features, 2)
pointsB = New Matrix(Of Single)(features, 2)
For i As Integer = 0 To features - 1
    pointsA(i, 0) = featuresA(0)(i).X
    pointsA(i, 1) = featuresA(0)(i).Y
    pointsB(i, 0) = featuresB(0)(i).X
    pointsB(i, 1) = featuresB(0)(i).Y
Next
Dim Homography As New Matrix(Of Double)(3, 3)
cvFindHomography(pointsA.Ptr, pointsB.Ptr, Homography, HOMOGRAPHY_METHOD.RANSAC, 1, 0)

and it looks right, the camera moved leftwards and upwards: Now I want to find out how much the camera moved and rotated. If I declare my camera position and what it's looking at:

' Create camera location at origin and lookat (straight ahead, 1 in the Z axis)
Location = New Matrix(Of Double)(2, 3)
location(0, 0) = 0 ' X location
location(0, 1) = 0 ' Y location
location(0, 2) = 0 ' Z location
location(1, 0) = 0 ' X lookat
location(1, 1) = 0 ' Y lookat
location(1, 2) = 1 ' Z lookat

How do I calculate the new position and lookat?

If I'm doing this all wrong or if there's a better method, any suggestions would be very welcome, thanks!

解决方案

Well what your looking at is in simple terms a Pythagorean theorem problem a^2 + b^2 = c^2. However when it comes to camera based applications things are not very easy to accurately determine. You have found half of the detail you need for "a" however finding "b" or "c" is much harder.

The Short Answer

Basically it can't be done with a single camera. But it can be with done with two cameras.

The Long Winded Answer (Thought I'd explain in more depth, no pun intended)

I'll try and explain, say we select two points within our image and move the camera left. We know the distance from the camera of each point B1 is 20mm and point B2 is 40mm . Now lets assume that we process the image and our measurement are A1 is (0,2) and A2 is (0,4) these are related to B1 and B2 respectively. Now A1 and A2 are not measurements; they are pixels of movement.

What we now have to do is multiply the change in A1 and A2 by a calculated constant which will be the real world distance at B1 and B2. NOTE: Each one these is different according to measurement B*. This all relates to Angle of view or more commonly called the Field of View in photography at different distances. You can accurately calculate the constant if you know the size of each pixel on the camera CCD and the f number of the lens you have inside the camera.

I would expect this isn't the case so at different distances you have to place an object of which you know the length and see how many pixels it takes up. Close up you can use a ruler to make things easier. With these measurements. You take this data and form a curve with a line of best fit. Where the X-axis will be the distance of the object and the Y-axis will be the constant of pixel to distance ratio that you must multiply your movement by.

So how do we apply this curve. Well it's guess work. In theory the larger the measurement of movement A* the closer the object to the camera. In our example our ratios for A1 > A2 say 5mm and 3mm respectively and we would now know that point B1 has moved 10mm (2x5mm) and B2 has moved 6mm (2x6mm). But let's face it - we will never know B and we will never be able to tell if a distance moved is 20 pixels of an object close up not moving far or an object far away moving a much great distance. This is why things like the Xbox Kinect use additional sensors to get depth information that can be tied to the objects within the image.

What you attempting could be attempted with two cameras as the distance between these cameras is known the movement can be more accurately calculated (effectively without using a depth sensor). The maths behind this is extremely complex and I would suggest looking up some journal papers on the subject. If you would like me to explain the theory, I can attempt to.

All my experience comes from designing high speed video acquisition and image processing for my PHD so trust me, it can't be done with one camera, sorry. I hope some of this helps.

Cheers

Chris

[EDIT]

I was going to add a comment but this is easier due to the bulk of information:

Since it is the Kinect I will assume you have some relevant depth information associated with each point if not you will need to figure out how to get this.

The equation you will need to start of with is for the Field of View (FOV):

o/d = i/f

Where:

f is equal to the focal length of the lens usually given in mm (i.e. 18 28 30 50 are standard examples)

d is the object distance from the lens gathered from kinect data

o is the object dimension (or "field of view" perpendicular to and bisected by the optical axis).

i is the image dimension (or "field stop" perpendicular to and bisected by the optical axis).

We need to calculate i, where o is our unknown so for i (which is a diagonal measurement),

We will need the size of the pixel on the ccd this will in micrometres or µm you will need to find this information out, For know we will take it as being 14um which is standard for a midrange area scan camera.

So first we need to work out i horizontal dimension (ih) which is the number of pixels of the width of the camera multiplied by the size of the ccd pixel (We will use 640 x 320)

so: ih = 640*14um = 8960um

   = 8960/1000 = 8.96mm

Now we need i vertical dimension (iv) same process but height

so: iv = (320 * 14um) / 1000 = 4.48mm

Now i is found by Pythagorean theorem Pythagorean theorem a^2 + b^2 = c^2

so: i = sqrt(ih^2 _ iv^2)

  = 10.02 mm

Now we will assume we have a 28 mm lens. Again, this exact value will have to be found out. So our equation is rearranged to give us o is:

o = (i * d) / f

Remember o will be diagonal (we will assume of object or point is 50mm away):

o = (10.02mm * 50mm) / 28mm

17.89mm

Now we need to work out o horizontal dimension (oh) and o vertical dimension (ov) as this will give us the distance per pixel that the object has moved. Now as FOV α CCD or i is directly proportional to o we will work out a ratio k

k = i/o

= 10.02 / 17.89 

= 0.56

so:

o horizontal dimension (oh):

oh = ih / k

= 8.96mm / 0.56 = 16mm per pixel

o vertical dimension (ov):

ov = iv / k

= 4.48mm / 0.56 = 8mm per pixel

Now we have the constants we require, let's use it in an example. If our object at 50mm moves from position (0,0) to (2,4) then the measurements in real life are:

(2*16mm , 4*8mm) = (32mm,32mm)

Again, a Pythagorean theorem: a^2 + b^2 = c^2

Total distance = sqrt(32^2 + 32^2)

           = 45.25mm

Complicated I know, but once you have this in a program it's easier. So for every point you will have to repeat at least half the process as d will change on therefore o for every point your examining.

Hope this gets you on your way,

Cheers Chris

这篇关于从单应性中提取变换和旋转矩阵?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆