K-d 树:具有易处理伪代码的最近邻搜索算法 [英] K-d tree: nearest neighbor search algorithm with tractable pseudo code

查看:90
本文介绍了K-d 树:具有易处理伪代码的最近邻搜索算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

维基百科中最近邻 (NN) 搜索的伪代码对我来说不够易于处理.与实现有关的帖子很少,但它们似乎是特定于语言的.所以我发现很难理解 NN 搜索是如何工作的.该图取自

The pseudo-code for nearest neighbor (NN) search in Wikipedia is not tractable enough for me. Few more posts available with implementations, but they seem to be language specific. So I'm finding it hard to understand how NN search works. This diagram is taken from https://www.cs.cmu.edu/~ckingsf/bioinfo-lectures/kdtrees.pdf. And I'm trying to understand it with a specific case say query point Q = (52,52). Assume two dim are (x,y) and root level splits by x-dim.

搜索神经网络:

首先,我从根到叶子,就好像我要插入 Q 一样;这样做,叶子是(55,1).将(全局)var current_best 从 INFINITY 更新为 (55-52)2 + (1-52)2 = 2610.

First, I go down from root to a leaf as if I'm trying to insert the Q; and doing so, the leaf is (55,1). Update a (global) var current_best from INFINITY to (55-52)2 + (1-52)2 = 2610.

接下来,我上升到 (70,70) 并将 current_best 从 2610 更新为 182+182=648.由于这提供了更好的距离,我们必须探查它的子树:这是正确的理解吗?

Next, I go up to (70,70) and updates current_best from 2610 to 182+182=648. Since this gives better distance, we must probe it's sub-tree: is this correct understanding?

此外,我们看到节点 (60,80) 没有给出更好的结果(即 current_best 没有更新).

Further, we see node (60,80) doesn't give better result (i.e. no update for current_best).

在进一步向上的过程中,我们发现 root (51,75) 给出了更好的结果(current_best 设置为 530).所以,根据我的理解,我们必须检查它的其他子树.

In the process of going up further, we find root (51,75) gives even better result (current_best gets set to 530). So, applying my understanding, we must check it's other sub-tree.

(25,40) 不会产生更好的结果.我的理解是,我们仍然需要验证(25,40)的子树.但是,在这种情况下,由于该节点使用 y-dim,并且由于 Qy > 40,我们只需要检查右子树(根在 (35,90)):这是正确的理解吗?

The (25,40) doesn't yield any better result. What I understand is that we still need to verify the sub-tree of (25,40). However, in this case, since this node uses y-dim, and since Q.y > 40, we need to check only right sub-tree (rooted at (35,90)): is this correct understanding?

简而言之,如果一个节点为 current_distance 提供更好的结果,我们必须探测两个子节点;如果一个节点没有提供更好的结果,我们所能做的就是忽略其中一个子树,但必须根据条件探测另一个子树(按特定维度分割超平面).这是正确的理解吗?

In short, what I see is if a node provides better result for current_distance, we must probe both children nodes; and if a node does NOT provide better result, all we can do it ignore one of the sub-tree but must probe the other sub-tree based on the condition (splitting hyperplane by specific dimension). Is this correct understanding?

最后,我真的很感谢任何人为 Kd 树的 NN 搜索提供易于处理的伪代码

推荐答案

想象目标点和它周围的圆盘,其半径等于迄今为止找到的最短距离(初始为无穷大).

Imagine the target point and a disk around it, with radius equal to the shortest distance found so far (initially, infinity).

您位于将平面分成两个半平面的树的根部.使半径等于当前半径和目标到根的距离中的最小值.然后递归到与磁盘相交的半平面,只要根有儿子.

You are at the root of a tree that splits the plane in two half planes. Make the radius equal to the minimum of the current radius and the distance from the target to the root. Then recurse to the half-plane(s) that intersect(s) the disk, as long as the root has sons.

确保跟踪哪个根达到了最小值.

Make sure to keep a trace of which root achieved the minimum.

Visit(root):
    d= distance(target, root)
    if d < r:
        r= d
        closest= root

    if root.left and root.x - target.x < r:
        Visit(root.left)
    if root.right and target.x - root.x < r:
        Visit(root.right)

注意:根据您使用的轴选择策略,半平面测试在 xy 上进行.

Caution: the half-plane test is on x or y depending on the axis selection policy that you use.

这篇关于K-d 树:具有易处理伪代码的最近邻搜索算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆