搜索k个最近的点 [英] searching for k nearest points

查看:60
本文介绍了搜索k个最近的点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有很多看起来像这样的功能:

I have a large set of features that looks like this:

id1 28273 20866 29961 27190 31790 19714 8643 14482 5384 ....  upto 1000
id2 12343 45634 29961 27130 33790 14714 7633 15483 4484 ....  
id3 ..... ..... ..... ..... ..... ..... .... ..... .... .... .   .   .
...
id200000 .... .... ... ..  .  .  .  .

我想为每个id欧几里得距离进行计算,并对它们进行排序以找到5个最近点. 因为我的数据集非常大.最好的方法是什么?

I want to compute for each id euclidean distance and sort them to find the 5-nearest points. Because my dataset is very large. what is the best way to do it.

推荐答案

scikit-learn具有最近的邻居搜索.示例:

scikit-learn has nearest neighbor search. Example:

  1. 将数据加载到NumPy数组中.

  1. Load your data into a NumPy array.

>>> import numpy as np
>>> X = np.array([[28273, 20866, 29961, 27190, 31790, 19714, 8643, 14482, 5384, ...],
                  [12343, 45634, 29961, 27130, 33790, 14714, 7633, 15483, 4484, ...], 
                  ...
                  ])

(仅显示两点.)

适合NearestNeighbors对象.

>>> from sklearn.neighbors import NearestNeighbors
>>> knn = NearestNeighbors(n_neighbors=5)
>>> knn.fit(X)
NearestNeighbors(algorithm='auto', leaf_size=30, n_neighbors=5, p=2,
         radius=1.0, warn_on_equidistant=True)

p=2表示欧几里得(L2)距离. p=1表示曼哈顿(L1)距离.

p=2 means Euclidean (L2) distance. p=1 would mean Manhattan (L1) distance.

执行查询.要获取X[0]的邻居,您的第一个数据点:

Perform queries. To get the neighbors of X[0], your first data point:

>>> knn.kneighbors(X[0], return_distance=False)
array([[0, 1]])

因此,X[0]的最近邻居是X[0]本身和X[1](当然).

So, the nearest neighbors of X[0] are X[0] itself and X[1] (of course).

请确保您设置了n_neighbors=6,因为您设置中的每个点都将是其自己最近的邻居.

Make sure you set n_neighbors=6 because every point in your set is going to be its own nearest neighbor.

免责声明:我参与了scikit学习开发,因此这不是公正的建议.

Disclaimer: I'm involved in scikit-learn development, so this is not unbiased advice.

这篇关于搜索k个最近的点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆