Python中的增量最近邻算法 [英] Incremental Nearest Neighbor Algorithm in Python

查看:178
本文介绍了Python中的增量最近邻算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人知道可以用增量方式更新的Python中实现的最近邻居算法吗?我发现的所有文件,例如此文件,似乎都是批处理.有可能实现增量式NN算法吗?

Is anyone aware of a nearest neighbor algorithm implemented in Python that can be updated incrementally? All the ones I've found, such as this one, appear to be batch processes. Is it possible to implement an incremental NN algorithm?

推荐答案

我认为KD-tree或KNN-tree的增量构造的问题是,正如您在评论中提到的那样,该树最终将变得不平衡,您将无法进行简单的树状旋转来解决平衡问题并保持一致性.至少,重新平衡任务并非微不足道,并且绝对不会在每次插入时都这样做.通常,人们会选择使用批处理方法构建一棵树,插入一堆新点,并允许该树变得不平衡直到一个点,然后重新平衡它.

I think the problem with incremental construction of a KD-tree or KNN-tree is, as you've alluded to in a comment, that the tree will eventually become unbalanced and you can't do simple tree rotation to fix balance problems and keep consistency. At the minimum, the re-balancing task is not trivial and one would definitely not want to do it at each insertion. Often, one will choose to build a tree with a batch method, insert a bunch of new points and allow the tree to become unbalanced up to a point, and then re-balance it.

要做的一件非常相似的事情是为M点批量构建数据结构,将其用于M'点,然后使用M + M'点批量重建数据结构.由于重新平衡不是正常的快速算法,对于树我们是熟悉的快速算法,因此重建不一定会比较慢,在某些情况下重建可能会更快(取决于点进入增量算法的方式).

A very similar thing to do is to build the data structure in batch for M points, use it for M' points, and then re-build the data structure in batch with M+M' points. Since re-balancing is not normal, fast algorithm we are familiar with for trees, rebuilding is not necessarily slow in comparison and in some cases can be faster (depending on how the sequence of the points entering your incremental algorithm).

也就是说,如果您采用重建方法,那么编写的代码量,调试的难度以及其他人对代码的理解的难度就会大大减少.如果这样做,则可以使用批处理方法,并保留尚未插入树中的点的外部列表.可以使用蛮力方法来确保所有这些都不比树中的那些更近.

That being said, the amount of code you write, debugging difficulty, and the ease of others' understanding of your code can be significantly smaller if you take the rebuild approach. If you do so, you can use a batch method and keep an external list of points not yet inserted into the tree. A brute force approach can be used to ensure none of these is closer than the ones in the tree.

下面是一些指向Python实现/讨论的链接,但是我还没有找到任何明确声称是增量的链接.祝你好运.

Some links to Python implementations/discussions are below, but I haven't found any that explicitly claim to be incremental. Good luck.

http://www.scipy.org/Cookbook/KDTree

http://cgi.di.uoa.gr/~compgeom /pycgalvisual/kdppython.shtml

http://sites.google.com/site/mikescoderama /Home/kd-tree-knn

http://en.wikipedia.org/wiki/Kd-tree

注意:我在这里的评论适用于高维空间.如果您使用2D或3D,我所说的内容可能不合适. (如果您在非常高的尺寸空间中工作,请使用蛮力或近似于最近的邻居.)

Note: My comments here apply to high-dimensional spaces. If you're working in 2D or 3D, what I've said may not be appropriate. (If you working in very high dimensional spaces, use brute force or approximate nearest neighbor.)

这篇关于Python中的增量最近邻算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆