将捕获的坐标转换为屏幕坐标 [英] Transforming captured co-ordinates into screen co-ordinates

查看:45
本文介绍了将捕获的坐标转换为屏幕坐标的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我认为这可能是一道简单的数学题,但我不知道现在发生了什么.

我正在网络摄像头上捕捉标记"的位置,我有一个标记列表及其坐标.其中四个标记是工作表面的外角,第五个(绿色)标记是小部件.像这样:

以下是一些示例数据:

  • 左上角标记(a=98,b=86)
  • 右上角标记(c=119,d=416)
  • 左下角标记(e=583,f=80)
  • 右下角标记(g=569,h=409)
  • 小部件标记(x=452,y=318)

我想以某种方式将网络摄像头的小部件位置转换为坐标以显示在屏幕上,其中左上角是 0,0 而不是 98,86,并以某种方式考虑到网络摄像头捕获的扭曲角度.

我该从哪里开始呢?任何帮助表示赞赏

解决方案

为了计算翘曲,您需要计算一个 中有一些很好的图片解释了同形异义词.

  • 您可以在此处
  • 使用变换矩阵

    MATLAB 代码:

    WP =[98 119 583 56986 416 80 4091 1 1 1];SC =[0 799 0 7990 0 599 5991 1 1 1];A = 零(8,9);对于 i = 1 : 4X = WP(:,i);x = SC(1,i);y = SC(2,i);A(2*i-1,:) = [0 0 0 -X(1) -X(2) -1 y*X(1) y*X(2) y];A(2*i,:) = [X(1) X(2) 1 0 0 0 -x*X(1) -x*X(2) -x];结尾[U S V] = svd(A);H = transpose(reshape(V(:,end),[3 3]));H = H/H(3,3);

    A

     0 0 0 -98 -86 -1 0 0 098 86 1 0 0 0 0 0 00 0 0 -119 -416 -1 0 0 0119 416 1 0 0 0 -95081 -332384 -7990 0 0 -583 -80 -1 349217 47920 599583 80 1 0 0 0 0 0 00 0 0 -569 -409 -1 340831 244991 599569 409 1 0 0 0 -454631 -326791 -799

    I think this is probably a simple maths question but I have no idea what's going on right now.

    I'm capturing the positions of "markers" on a webcam and I have a list of markers and their co-ordinates. Four of the markers are the outer corners of a work surface, and the fifth (green) marker is a widget. Like this:

    Here's some example data:

    • Top left marker (a=98, b=86)
    • Top right marker (c=119, d=416)
    • Bottom left marker (e=583, f=80)
    • Bottom right marker (g=569, h=409)
    • Widget marker (x=452, y=318)

    I'd like to somehow transform the webcam's widget position into a co-ordinate to display on the screen, where top left is 0,0 not 98,86 and somehow take into account the warped angles from the webcam capture.

    Where would I even begin? Any help appreciated

    解决方案

    In order to compute the warping, you need to compute a homography between the four corners of your input rectangle and the screen.

    Since your webcam polygon seems to have an arbitrary shape, a full perspective homography can be used to convert it to a rectangle. It's not that complicated, and you can solve it with a mathematical function (should be easily available) known as Singular Value Decomposition or SVD.

    Background information:

    For planar transformations like this, you can easily describe them with a homography, which is a 3x3 matrix H such that if any point on or in your webcam polygon, say x1 were multiplied by H, i.e. H*x1, we would get a point on the screen (rectangular), i.e. x2.

    Now, note that these points are represented by their homogeneous coordinates which is nothing but adding a third coordinate (the reason for which is beyond the scope of this post). So, suppose your coordinates for X1 were, (100,100), then the homogeneous representation would be a column vector x1 = [100;100;1] (where ; represents a new row).

    Ok, so now we have 8 homogeneous vectors representing 4 points on the webcam polygon and the 4 corners of your screen - this is all we need to compute a homography.

    Computing the homography:

    A little math: I'm not going to get into the math, but briefly this is how we solve it:

    We know that 3x3 matrix H,

    H = 
    
    h11 h12 h13
    h21 h22 h23
    h31 h32 h33
    
    where hij represents the element in H at the ith row and the jth column
    

    can be used to get the new screen coordinates by x2 = H*x1. Also, the result will be something like x2 = [12;23;0.1] so to get it in the screen coordinates, we normalize it by the third element or X2 = (120,230) which is (12/0.1,23/0.1).

    So this means each point in your webcam polygon (WP) can be multiplied by H (and then normalized) to get your screen coordinates (SC), i.e.

    SC1 = H*WP1
    SC2 = H*WP2
    SC3 = H*WP3
    SC4 = H*WP4
    where SCi refers to the ith point in screen coordinates and 
          WPi means the same for the webcam polygon
    

    Computing H: (the quick and painless explanation)

    Pseudocode:

    for n = 1 to 4
    {
        // WP_n refers to the 4th point in the webcam polygon 
        X = WP_n;
    
        // SC_n refers to the nth point in the screen coordinates
        // corresponding to the nth point in the webcam polygon
    
        // For example, WP_1 and SC_1 is the top-left point for the webcam
        // polygon and the screen coordinates respectively.
    
        x = SC_n(1); y = SC_n(2);
    
        // A is the matrix which we'll solve to get H
        // A(i,:) is the ith row of A
    
        // Here we're stacking 2 rows per point correspondence on A
        // X(i) is the ith element of the vector X (the webcam polygon coordinates, e.g. (120,230)
        A(2*n-1,:) = [0 0 0 -X(1) -X(2) -1 y*X(1) y*X(2) y];    
        A(2*n,:)   = [X(1) X(2) 1 0 0 0 -x*X(1) -x*X(2) -x];
    }
    

    Once you have A, just compute svd(A) which will give decompose it into U,S,VT (such that A = USVT). The vector corresponding to the smallest singular value is H (once you reshape it into a 3x3 matrix).

    With H, you can retrieve the "warped" coordinates of your widget marker location by multiplying it with H and normalizing.

    Example:

    In your particular example if we assume that your screen size is 800x600,

    WP =
    
        98   119   583   569
        86   416    80   409
         1     1     1     1
    
    SC =
    
         0   799     0   799
         0     0   599   599
         1     1     1     1
    
    where each column corresponds to corresponding points.
    

    Then we get:

    H = 
       -0.0155   -1.2525  109.2306
       -0.6854    0.0436   63.4222
        0.0000    0.0001   -0.5692
    

    Again, I'm not going into the math, but if we normalize H by h33, i.e. divide each element in H by -0.5692 in the example above,

    H =
        0.0272    2.2004 -191.9061
        1.2042   -0.0766 -111.4258
       -0.0000   -0.0002    1.0000
    

    This gives us a lot of insight into the transformation.

    • [-191.9061;-111.4258] defines the translation of your points (in pixels)
    • [0.0272 2.2004;1.2042 -0.0766] defines the affine transformation (which is essentially scaling and rotation).
    • The last 1.0000 is so because we scaled H by it and
    • [-0.0000 -0.0002] denotes the projective transformation of your webcam polygon.

    Also, you can check if H is accurate my multiplying SC = H*WP and normalizing each column with its last element:

    SC = H*WP    
    
        0.0000 -413.6395         0 -411.8448
       -0.0000    0.0000 -332.7016 -308.7547
       -0.5580   -0.5177   -0.5554   -0.5155
    

    Dividing each column, by it's last element (e.g. in column 2, -413.6395/-0.5177 and 0/-0.5177):

    SC
       -0.0000  799.0000         0  799.0000
        0.0000   -0.0000  599.0000  599.0000
        1.0000    1.0000    1.0000    1.0000
    

    Which is the desired result.

    Widget Coordinates:

    Now, your widget coordinates can be transformed as well H*[452;318;1], which (after normalizing is (561.4161,440.9433).

    So, this is what it would look like after warping:

    As you can see, the green + represents the widget point after warping.

    Notes:

    1. There are some nice pictures in this article explaining homographies.
    2. You can play with transformation matrices here

    MATLAB Code:

    WP =[
        98   119   583   569
        86   416    80   409
         1     1     1     1
         ];
    
    SC =[
         0   799     0   799
         0     0   599   599
         1     1     1     1
         ];    
    
    A = zeros(8,9);  
    
    for i = 1 : 4     
        X = WP(:,i);    
        x = SC(1,i); y = SC(2,i);        
        A(2*i-1,:) = [0 0 0 -X(1) -X(2) -1 y*X(1) y*X(2) y];        
        A(2*i,:)   = [X(1) X(2) 1 0 0 0 -x*X(1) -x*X(2) -x];
    end
    
    [U S V] = svd(A);
    
    H = transpose(reshape(V(:,end),[3 3]));
    H = H/H(3,3);
    

    A

           0           0           0         -98         -86          -1           0           0           0
          98          86           1           0           0           0           0           0           0
           0           0           0        -119        -416          -1           0           0           0
         119         416           1           0           0           0      -95081     -332384        -799
           0           0           0        -583         -80          -1      349217       47920         599
         583          80           1           0           0           0           0           0           0
           0           0           0        -569        -409          -1      340831      244991         599
         569         409           1           0           0           0     -454631     -326791        -799
    

    这篇关于将捕获的坐标转换为屏幕坐标的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆