与 sklearn.neighbors.NearestNeighbors 的输出混淆 [英] confused with the output of sklearn.neighbors.NearestNeighbors

查看:43
本文介绍了与 sklearn.neighbors.NearestNeighbors 的输出混淆的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是代码.

from sklearn.neighbors import NearestNeighbors
import numpy as np
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
nbrs = NearestNeighbors(n_neighbors=2, algorithm='ball_tree').fit(X)
distances, indices = nbrs.kneighbors(X)


>indices

>array([[0, 1],[1, 0],[2, 1],[3, 4],[4, 3],[5, 4]])

>distances

>array([[0.        , 1.        ],[0.        , 1.        ],[0.        , 1.41421356], [0.        , 1.        ],[0.        , 1.        ],[0.        , 1.41421356]])

我不太了解索引"和距离"的形状.我如何理解这些数字的含义?

I don't really understand the shape of 'indices' and 'distances'. How do I understand what these numbers mean?

推荐答案

其实很简单.对于 kneighbors()(此处为 X)的输入中的每个数据样本,它将显示 2 个邻居.(因为您已指定 n_neighbors=2.indices 将为您提供训练数据的索引(此处再次为 X)和 distances 将为您提供训练数据(索引所指的)中相应数据点的距离.

Its pretty straightforward actually. For each data sample in the input to kneighbors() (X here), it will show 2 neighbors. (Because you have specified n_neighbors=2. The indices will give you the index of training data (again X here) and distances will give you the distance for the corresponding data point in training data (to which the indices are referring).

以单个数据点为例.假设 X[0] 作为第一个查询点,答案将是 indices[0]distances[0]

Take an example of single data point. Assuming X[0] as the first query point, the answer will be indices[0] and distances[0]

所以对于X[0]

  • 训练数据中第一个最近邻的索引为indices[0, 0] = 0,距离为distances[0, 0] = 0.您可以使用此索引值从训练数据中获取实际数据样本.

  • the index of first nearest neighbor in training data is indices[0, 0] = 0 and distance is distances[0, 0] = 0. You can use this index value to get the actual data sample from the training data.

这是有道理的,因为你使用相同的数据进行训练和测试,所以每个点的第一个最近邻是它自己,距离是0.

This makes sense, because you used the same data for training and testing, so the first nearest neighbor for each point is itself and the distance is 0.

第二近邻的索引是indices[0, 1] = 1,距离是distances[0, 1] = 1

the index of second nearest neigbor is indices[0, 1] = 1 and distance is distances[0, 1] = 1

对于所有其他点也是如此.indicesdistances 中的第一个维度对应于查询点,第二个维度对应于询问的邻居数量.

Similarly for all other points. The first dimension in indices and distances correspond to the query points and second dimension to the number of neighbors asked.

这篇关于与 sklearn.neighbors.NearestNeighbors 的输出混淆的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆