根据查找数据帧计算距离 [英] Calculate distance based on a lookup dataframe

查看:101
本文介绍了根据查找数据帧计算距离的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个DataFrame和一个查找表.对于DataFrame中的键,我想在查找表中查找对应的行,并计算许多列的欧几里得距离.模拟数据看起来像

I have a DataFrame and a lookup table. For a key in the DataFrame I would like to lookup the corresponding row in the lookup table and calculate the Euclidian distance for a number of columns. Mock data looks like

import pandas as pd
import numpy.random as rand

df = pd.DataFrame({'key':rand.randint(0, 5, 10), 
                    'X': rand.randn(10),  
                    'Y': rand.randn(10),  
                    'Z': rand.randn(10)})

          X         Y         Z  key
0  0.163142  0.387871 -0.433157    3
1 -2.020957 -1.537615 -1.996704    0
2  1.249118  1.633246  0.028222    1
3 -0.019601  1.757136  0.787936    2
4 -0.039649  1.380557  0.123677    0
5  0.500814 -1.049591 -1.261868    3
6  1.175576 -0.310895  0.549420    0
7 -0.152696  0.139020  0.887219    2
8  0.491099  0.434652  0.791038    2
9 -0.231334  0.264414  0.913475    4


lookup = pd.DataFrame({'X': rand.randn(5),  
                    'Y': rand.randn(5),  
                    'Z': rand.randn(5)})

          X         Y         Z
0  0.242419 -0.630230 -0.254344
1  0.799573  0.354169  1.099456
2 -0.754582 -1.882192 -1.270382
3 -1.645707 -0.131905 -0.445954
4  0.743351  0.456220  0.975457
5  0.136197  0.278329 -2.336110

例如,第零列具有值

df.loc[0,'X':'Z'].values
[0.163142,0.387871,-0.433157]

键是3,所以查询中的行是

the key is 3 so the row in the lookup is

lookup.iloc[3,:].values
[-1.645707 -0.131905 -0.445954]

距离是

import numpy as np
np.linalg.norm(np.array([0.163142,0.387871,-0.433157]) - np.array([-0.754582, -1.882192, -1.270382]))
2.5877304853423202

我想对df中的每一行执行此操作,并将值作为新列返回.有一种巧妙的方法可以做到这一点吗?

I would like to do this for every row in df and return the value as a new column. Is there a slick way to do this?

推荐答案

@Wen的一种更干净,更快的版本.仍在使用reindex,但使用numpy.linalg.norm而不是scipy.spatial.distance.euclidean

A somewhat cleaner and much faster version of @Wen. Still using reindex but with numpy.linalg.norm instead of scipy.spatial.distance.euclidean

import numpy as np    
dims = ['X','Y','Z']
df['distance'] = np.linalg.norm((df[dims].values)-(lookup.reindex(df['key']).values), axis = 1)

这篇关于根据查找数据帧计算距离的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆