计算 pandas 数据框中最近邻居的平均距离 [英] calculating average distance of nearest neighbours in pandas dataframe

查看:94
本文介绍了计算 pandas 数据框中最近邻居的平均距离的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一组对象及其随时间的位置.我想获取每辆汽车与其最近邻居之间的距离,并计算每个时间点的平均值.数据帧示例如下:

I have a set of objects and their positions over time. I would like to get the distance between each car and their nearest neighbour, and calculate an average of this for each time point. An example dataframe is as follows:

 time = [0, 0, 0, 1, 1, 2, 2]
 x = [216, 218, 217, 280, 290, 130, 132]
 y = [13, 12, 12, 110, 109, 3, 56]
 car = [1, 2, 3, 1, 3, 4, 5]
 df = pd.DataFrame({'time': time, 'x': x, 'y': y, 'car': car})
 df

         x       y      car
 time
  0     216     13       1
  0     218     12       2
  0     217     12       3
  1     280     110      1
  1     290     109      3
  2     130     3        4
  2     132     56       5

对于每个时间点,我想知道每辆汽车最近的汽车邻居.示例:

For each time point, I would like to know the nearest car neighbour for each car. Example:

df2

          car    nearest_neighbour    euclidean_distance  
 time
  0       1            3                    1.41
  0       2            3                    1.00
  0       3            1                    1.41
  1       1            3                    10.05
  1       3            1                    10.05
  2       4            5                    53.04
  2       5            4                    53.04

我知道我可以从

I know I can caluclate the pairwise distances between cars from How to apply euclidean distance function to a groupby object in pandas dataframe? but how do I get the nearest neighbour for each car?

在那之后,使用groupby来获取每一帧的平均距离似乎很简单,但是第二步确实让我失望了. 感谢帮助!

After that it seems simple enough to get an average of the distances for each frame using groupby, but its the second step that really throws me off. Help appreciated!

推荐答案

这可能有点矫kill过正,但您可以使用与scikit保持最近的邻居

It might be a bit overkill but you could use nearest neighbors from scikit

一个例子:

import numpy as np 
from sklearn.neighbors import NearestNeighbors
import pandas as pd

def nn(x):
    nbrs = NearestNeighbors(n_neighbors=2, algorithm='auto', metric='euclidean').fit(x)
    distances, indices = nbrs.kneighbors(x)
    return distances, indices

time = [0, 0, 0, 1, 1, 2, 2]
x = [216, 218, 217, 280, 290, 130, 132]
y = [13, 12, 12, 110, 109, 3, 56] 
car = [1, 2, 3, 1, 3, 4, 5]
df = pd.DataFrame({'time': time, 'x': x, 'y': y, 'car': car})

#This has the index of the nearest neighbor in the group, as well as the distance
nns = df.drop('car', 1).groupby('time').apply(lambda x: nn(x.as_matrix()))

groups = df.groupby('time')
nn_rows = []
for i, nn_set in enumerate(nns):
    group = groups.get_group(i)
    for j, tup in enumerate(zip(nn_set[0], nn_set[1])):
        nn_rows.append({'time': i,
                        'car': group.iloc[j]['car'],
                        'nearest_neighbour': group.iloc[tup[1][1]]['car'],
                        'euclidean_distance': tup[0][1]})

nn_df = pd.DataFrame(nn_rows).set_index('time')

结果:

      car  euclidean_distance  nearest_neighbour
time                                            
0       1            1.414214                  3
0       2            1.000000                  3
0       3            1.000000                  2
1       1           10.049876                  3
1       3           10.049876                  1
2       4           53.037722                  5
2       5           53.037722                  4

(请注意,在时间0,汽车3的最近邻居是汽车2.sqrt((217-216)**2 + 1)大约是1.4142135623730951sqrt((218-217)**2 + 0) = 1)

(Note that at time 0, car 3's nearest neighbor is car 2. sqrt((217-216)**2 + 1) is about 1.4142135623730951 while sqrt((218-217)**2 + 0) = 1)

这篇关于计算 pandas 数据框中最近邻居的平均距离的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆