如何将 2D 点反向投影为 3D? [英] How do I reverse-project 2D points into 3D?

查看:37
本文介绍了如何将 2D 点反向投影为 3D?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在屏幕空间中有 4 个 2D 点,我需要将它们反向投影回 3D 空间.我知道这 4 个点中的每一个都是 3D 旋转刚性矩形的一个角,并且我知道矩形的大小.我如何从中获得 3D 坐标?

我没有使用任何特定的 API,也没有现有的投影矩阵.我只是在寻找基本的数学来做到这一点.当然没有足够的数据将单个 2D 点转换为 3D 没有其他参考,但我想如果你有 4 个点,你知道它们在同一平面上彼此成直角,你知道他们之间的距离,你应该能够从那里算出来.不幸的是,我无法完全弄清楚如何.

这可能属于摄影测量的范畴,但谷歌搜索并没有让我找到任何有用的信息.

解决方案

好吧,我是来找答案的,没有找到简单直接的东西,所以我继续做了愚蠢但有效的(而且相对简单)) 事物:蒙特卡罗优化.

简单地说,算法如下:随机扰动你的投影矩阵,直到它把你已知的 3D 坐标投影到你已知的 2D 坐标上.

这是来自 Thomas the Tank Engine 的静态照片:

假设我们使用 GIMP 在地平面上找到我们认为是正方形的 2D 坐标(是否真的是正方形取决于您对深度的判断):

我在二维图像中得到四个点:(318, 247), (326, 312), (418, 241), 和 (452, 303).

按照惯例,我们说这些点应该对应于 3D 点:(0, 0, 0), (0, 0, 1), (1, 0, 0)(1, 0, 1).换句话说,y=0 平面上的一个单位正方形.

将这些 3D 坐标中的每一个投影到 2D 中是通过将 4D 向量 [x, y, z, 1] 与 4x4 投影矩阵相乘,然后将 x 和 y 分量除以 z 来完成的实际上得到透视校正.这或多或少是

从数学的角度来看,这肯定不是很优雅;你没有得到一个封闭形式的方程,你可以将你的数字插入其中并获得直接(和准确)的答案.然而,它确实允许您添加额外的约束,而不必担心您的方程式变得复杂;例如,如果我们还想合并高度,我们可以使用房子的那个角落,并说(在我们的评估函数中)从地面到屋顶的距离应该是这样,然后再次运行算法.所以是的,这是一种蛮力,但有效,而且效果很好.

I have 4 2D points in screen-space, and I need to reverse-project them back into 3D space. I know that each of the 4 points is a corner of a 3D-rotated rigid rectangle, and I know the size of the rectangle. How can I get 3D coordinates from this?

I am not using any particular API, and I do not have an existing projection matrix. I'm just looking for basic math to do this. Of course there isn't enough data to convert a single 2D point to 3D with no other reference, but I imagine that if you have 4 points, you know that they're all at right-angles to each other on the same plane, and you know the distance between them, you should be able to figure it out from there. Unfortunately I can't quite work out how though.

This might fall under the umbrella of photogrammetry, but google searches for that haven't led me to any helpful information.

解决方案

Alright, I came here looking for an answer and didn't find something simple and straightforward, so I went ahead and did the dumb but effective (and relatively simple) thing: Monte Carlo optimisation.

Very simply put, the algorithm is as follows: Randomly perturb your projection matrix until it projects your known 3D coordinates to your known 2D coordinates.

Here is a still photo from Thomas the Tank Engine:

Let's say we use GIMP to find the 2D coordinates of what we think is a square on the ground plane (whether or not it is really a square depends on your judgment of the depth):

I get four points in the 2D image: (318, 247), (326, 312), (418, 241), and (452, 303).

By convention, we say that these points should correspond to the 3D points: (0, 0, 0), (0, 0, 1), (1, 0, 0), and (1, 0, 1). In other words, a unit square in the y=0 plane.

Projecting each of these 3D coordinates into 2D is done by multiplying the 4D vector [x, y, z, 1] with a 4x4 projection matrix, then dividing the x and y components by z to actually get the perspective correction. This is more or less what gluProject() does, except gluProject() also takes the current viewport into account and takes a separate modelview matrix into account (we can just assume the modelview matrix is the identity matrix). It is very handy to look at the gluProject() documentation because I actually want a solution that works for OpenGL, but beware that the documentation is missing the division by z in the formula.

Remember, the algorithm is to start with some projection matrix and randomly perturb it until it gives the projection that we want. So what we're going to do is project each of the four 3D points and see how close we get to the 2D points we wanted. If our random perturbations cause the projected 2D points to get closer to the ones we marked above, then we keep that matrix as an improvement over our initial (or previous) guess.

Let's define our points:

# Known 2D coordinates of our rectangle
i0 = Point2(318, 247)
i1 = Point2(326, 312)
i2 = Point2(418, 241)
i3 = Point2(452, 303)

# 3D coordinates corresponding to i0, i1, i2, i3
r0 = Point3(0, 0, 0)
r1 = Point3(0, 0, 1)
r2 = Point3(1, 0, 0)
r3 = Point3(1, 0, 1)

We need to start with some matrix, identity matrix seems a natural choice:

mat = [
    [1, 0, 0, 0],
    [0, 1, 0, 0],
    [0, 0, 1, 0],
    [0, 0, 0, 1],
]

We need to actually implement the projection (which is basically a matrix multiplication):

def project(p, mat):
    x = mat[0][0] * p.x + mat[0][1] * p.y + mat[0][2] * p.z + mat[0][3] * 1
    y = mat[1][0] * p.x + mat[1][1] * p.y + mat[1][2] * p.z + mat[1][3] * 1
    w = mat[3][0] * p.x + mat[3][1] * p.y + mat[3][2] * p.z + mat[3][3] * 1
    return Point(720 * (x / w + 1) / 2., 576 - 576 * (y / w + 1) / 2.)

This is basically what gluProject() does, 720 and 576 are the width and height of the image, respectively (i.e. the viewport), and we subtract from 576 to count for the fact that we counted y coordinates from the top while OpenGL typically counts them from the bottom. You'll notice we're not calculating z, that's because we don't really need it here (though it could be handy to ensure it falls within the range that OpenGL uses for the depth buffer).

Now we need a function for evaluating how close we are to the correct solution. The value returned by this function is what we will use to check whether one matrix is better than another. I chose to go by sum of squared distances, i.e.:

# The squared distance between two points a and b
def norm2(a, b):
    dx = b.x - a.x
    dy = b.y - a.y
    return dx * dx + dy * dy

def evaluate(mat): 
    c0 = project(r0, mat)
    c1 = project(r1, mat)
    c2 = project(r2, mat)
    c3 = project(r3, mat)
    return norm2(i0, c0) + norm2(i1, c1) + norm2(i2, c2) + norm2(i3, c3)

To perturb the matrix, we simply pick an element to perturb by a random amount within some range:

def perturb(amount):
    from copy import deepcopy
    from random import randrange, uniform
    mat2 = deepcopy(mat)
    mat2[randrange(4)][randrange(4)] += uniform(-amount, amount)

(It's worth noting that our project() function doesn't actually use mat[2] at all, since we don't compute z, and since all our y coordinates are 0 the mat[*][1] values are irrelevant as well. We could use this fact and never try to perturb those values, which would give a small speedup, but that is left as an exercise...)

For convenience, let's add a function that does the bulk of the approximation by calling perturb() over and over again on what is the best matrix we've found so far:

def approximate(mat, amount, n=100000):
    est = evaluate(mat)

    for i in xrange(n):
        mat2 = perturb(mat, amount)
        est2 = evaluate(mat2)
        if est2 < est:
            mat = mat2
            est = est2

    return mat, est

Now all that's left to do is to run it...:

for i in xrange(100):
    mat = approximate(mat, 1)
    mat = approximate(mat, .1)

I find this already gives a pretty accurate answer. After running for a while, the matrix I found was:

[
    [1.0836000765696232,  0,  0.16272110011060575, -0.44811064935115597],
    [0.09339193527789781, 1, -0.7990570384334473,   0.539087345090207  ],
    [0,                   0,  1,                    0                  ],
    [0.06700844759602216, 0, -0.8333379578853196,   3.875290562060915  ],
]

with an error of around 2.6e-5. (Notice how the elements we said were not used in the computation have not actually been changed from our initial matrix; that's because changing these entries would not change the result of the evaluation and so the change would never get carried along.)

We can pass the matrix into OpenGL using glLoadMatrix() (but remember to transpose it first, and remember to load your modelview matrix with the identity matrix):

def transpose(m):
    return [
        [m[0][0], m[1][0], m[2][0], m[3][0]],
        [m[0][1], m[1][1], m[2][1], m[3][1]],
        [m[0][2], m[1][2], m[2][2], m[3][2]],
        [m[0][3], m[1][3], m[2][3], m[3][3]],
    ]

glLoadMatrixf(transpose(mat))

Now we can for example translate along the z axis to get different positions along the tracks:

glTranslate(0, 0, frame)
frame = frame + 1

glBegin(GL_QUADS)
glVertex3f(0, 0, 0)
glVertex3f(0, 0, 1)
glVertex3f(1, 0, 1)
glVertex3f(1, 0, 0)
glEnd()

For sure this is not very elegant from a mathematical point of view; you don't get a closed form equation that you can just plug your numbers into and get a direct (and accurate) answer. HOWEVER, it does allow you to add additional constraints without having to worry about complicating your equations; for example if we wanted to incorporate height as well, we could use that corner of the house and say (in our evaluation function) that the distance from the ground to the roof should be so-and-so, and run the algorithm again. So yes, it's a brute force of sorts, but works, and works well.

这篇关于如何将 2D 点反向投影为 3D?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆