Python Pandas中的距离矩阵 [英] Distance matrix in Python Pandas

查看:224
本文介绍了Python Pandas中的距离矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是python的新手,但是我喜欢在熊猫中处理数据.

I am a newbie in python, but I like to process data in pandas.

我有一百对CSV数据,例如乘客和公共汽车站数据.乘客结构数据是人"和XY坐标(UTM-Meter).公交车站数据结构是公交车站名称(BS)和XY坐标(UTM-Meter. 我的数据如下:

I have a hundred pairs of CSV data such as passenger and bus stop data. The passenger structure data is Person, and XY coordinates (UTM-Meter). The bus stop data structure is the bus stop name (BS), and XY coordinates (UTM-Meter. My data looks like this:

df(person)
Person  X           Y
A_first 260357.3199 4064458.685
A_last  261618.5243 4064789.928
B_first 258270.5926 4063717.789
B_last  258270.5926 4063717.789
C_first 259051.758  4064462.021
C_last  258270.5926 4063717.789
D_first 260764.3916 4064624.977
D_last  260753.9053 4064569.745
E_first 258270.5926 4063717.789
E_last  258270.5926 4063717.789
F_first 258262.2825 4063740.234
F_last  258270.5926 4063717.789
G_first 258326.2393 4064104.899
G_last  258270.5926 4063717.789
H_first 259491.5911 4064838.328
H_last  260747.1408 4064647.669

df(bus stop)
BS  X           Y 
BS1 258280.8228 4063715.835
BS2 259961.3734 4064840.298
BS3 260360.0219 4064468.593
BS4 260530.5126 4064683.101
BS5 260806.7009 4064399.184
BS6 261178.4165 4063605.91
BS7 261222.1745 4064495.158
BS8 261627.6173 4064281.38
BS9 261662.8833 4064793.444

我想分析人与公交车站数据之间的distance matrix.我的期望输出是InputID(人),TargetID(公交车站)和距离.距离是人员点数据与公交车站数据之间的距离.我的预期结果:

I want to analyze the distance matrix between person and bus stop data. My desire output is InputID (Person), TargetID (bus stop) and distance. Distance is the distance between person point data and bus stop point data. My expected result:

result
InputID (Person)    TargetID (BS)   Distance (meter)
A_first                 BS1                 xx
A_first                 BS2                 xx
A_first                 BS3                 xx
A_first                 BS4                 xx
A_first                 BS5                 xx
A_first                 BS6                 xx
A_first                 BS7                 xx
A_first                 BS8                 xx
A_first                 BS9                 xx
A_last                  BS1                 xx
A_last                  BS2                 xx
A_last                  BS3                 xx
A_last                  BS4                 xx
A_last                  BS5                 xx
A_last                  BS6                 xx
A_last                  BS7                 xx
A_last                  BS8                 xx
A_last                  BS9                 xx
............. so on

对我有什么建议吗?谢谢

Any advice for me? thank you

推荐答案

您可以先使用

import pandas as pd
import numpy as np

df1 = pd.DataFrame({
    'Person': ['A_first', 'A_last', 'B_first', 'B_last'],
    'X': [260357.3199, 261618.5243, 258270.5926, 258270.5926],
    'Y': [4064458.685, 4064789.928, 4063717.789, 4063717.789],
})

df2 = pd.DataFrame({
    'BS': ['BS1', 'BS2'],
    'X': [258280.8228, 259961.3734],
    'Y': [4063715.835, 4064840.298],
})


df1['key'] = 0
df2['key'] = 0

df_cartesian = df1.merge(df2, on='key').drop(columns=['key'])

然后计算按行计算的X和Y差,并采用L2范数:

Then calculate the row-wise X and Y difference, and take the L2 norm:

df_cartesian['X_diff'] = df_cartesian['X_x'] - df_cartesian['X_y']
df_cartesian['Y_diff'] = df_cartesian['Y_x'] - df_cartesian['Y_y']

df_cartesian['dist'] = np.linalg.norm(df_cartesian[['X_diff', 'Y_diff']], axis=1)

df_cartesian
#     Person          X_x          Y_x   BS          X_y          Y_y  \
# 0  A_first  260357.3199  4064458.685  BS1  258280.8228  4063715.835   
# 1  A_first  260357.3199  4064458.685  BS2  259961.3734  4064840.298   
# 2   A_last  261618.5243  4064789.928  BS1  258280.8228  4063715.835   
# 3   A_last  261618.5243  4064789.928  BS2  259961.3734  4064840.298   
# 4  B_first  258270.5926  4063717.789  BS1  258280.8228  4063715.835   
# 5  B_first  258270.5926  4063717.789  BS2  259961.3734  4064840.298   
# 6   B_last  258270.5926  4063717.789  BS1  258280.8228  4063715.835   
# 7   B_last  258270.5926  4063717.789  BS2  259961.3734  4064840.298   
# 
#       X_diff    Y_diff         dist  
# 0  2076.4971   742.850  2205.372152  
# 1   395.9465  -381.613   549.911004  
# 2  3337.7015  1074.093  3506.269681  
# 3  1657.1509   -50.370  1657.916235  
# 4   -10.2302     1.954    10.415138  
# 5 -1690.7808 -1122.509  2029.474358  
# 6   -10.2302     1.954    10.415138  
# 7 -1690.7808 -1122.509  2029.474358  

这篇关于Python Pandas中的距离矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆