理解 scipy.spatial.KDTree 中的 `leafsize` [英] Understanding `leafsize` in scipy.spatial.KDTree

查看:113
本文介绍了理解 scipy.spatial.KDTree 中的 `leafsize`的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题说明:

我在 3D 空间中有 150k 个点,它们的坐标存储在一个矩阵中,维度为 [150k, 3] 以毫米为单位.

I have 150k points in a 3D space with their coordinates stored in a matrix with dimension [150k, 3] in mm.

我想找到给定点 p 在半径 r 内的所有邻居.我想以最准确的方式做到这一点.

I want to find all the neighbors of a given point p that are within a radius r. And I want to do that in the most accurate way.

我应该如何选择我的 leafsize 参数?

How should I choose my leafsize parameter ?

from scipy.spatial import KDTree
import numpy as np

pts = np.random.rand(150000,3)

T1 = KDTree(pts, leafsize=20)
T2 = KDTree(pts, leafsize=1)

neighbors1= T1.query_ball_point((0.3,0.2,0.1), r=2.0)
neighbors2= T2.query_ball_point((0.3,0.2,0.1), r=2.0)

np.allclose(sorted(neighbors1), sorted(neighbors2))
True

推荐答案

query_ball_point 函数将为任何版本的搜索树返回正确的点集.leafsize 参数不会影响查询的结果,只会影响结果的性能.

The function query_ball_point will return the correct set of points for any version of the search tree. The leafsize parameter does not impact the results of the query, only the performance of the results.

想象一下下面显示的两棵树的相同数据(但不同的叶子尺寸参数)和一个搜索红色圆圈内所有点的查询.

Imagine two trees shown below for the same data (but different leafsize parameters) and a query searching for all points inside the red circle.

在这两种情况下,代码只会返回位于红色圆圈内的两个点.这是通过检查与圆相交的树的所有框中的所有点来完成的.在每种情况下,这会导致不同的工作量(即不同的性能).对于左树(对应于更大的叶子尺寸),算法必须检查圆内是否有 13 个点(6 个在上相交框中,7 个在下相交框中).在右边的树(叶子尺寸较小)中,只处理了三个点(一个在上面的相交框中,两个在下面的相交框中).

In both cases, the code will only return the two points that lie inside the red circle. This is done by checking all points in all boxes of the tree that intersect the circle. This leads to a different amount of work (i.e., different performance) in each case. For the left tree (corresponding to a larger leafsize), the algorithm has to check if 13 points are inside the circle (6 in the upper intersecting box and 7 in the lower intersecting box). In the right tree (which has a smaller leaf size), only three points get processed (one in the upper intersecting box and two in the lower intersecting box).

按照这个逻辑,你可能认为总是使用小叶尺寸是有意义的:这将最大限度地减少算法结束时的实际比较次数(确定点是否真的位于查询区域中).但这并不是那么简单:较小的叶子尺寸将生成更深的树,从而增加了构建时间和树遍历时间的成本.通过叶级比较获得树遍历性能的正确平衡实际上取决于进入树的数据类型以及您正在执行的特定叶级比较.这就是 scipy 提供 Leafsize 参数作为参数的原因,以便您可以调整事物以在特定算法上表现最佳.

Following this logic, you may think it just makes sense to always use a small leaf size: this will minimize the number of actual comparisons at the end of the algorithm (do decide if the points actually lie in the query region). But it isn't that simple: the smaller leaf size will generate a deeper tree adding cost to the construction time and to the tree traversal time. Getting the right balance of tree-traversal performance with the leaf-level comparisons really depends on the type of data going into the tree and the specific leaf-level comparisons you are doing. Which is why scipy provides the leafsize parameter as an argument so you can tune things to perform best on a particular algorithm.

这篇关于理解 scipy.spatial.KDTree 中的 `leafsize`的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆