使用距离矩阵计算Pandas Dataframe中行之间的距离 [英] Distance calculation between rows in Pandas Dataframe using a distance matrix

查看:1151
本文介绍了使用距离矩阵计算Pandas Dataframe中行之间的距离的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下Pandas DataFrame:

I have the following Pandas DataFrame:

In [31]:
import pandas as pd
sample = pd.DataFrame({'Sym1': ['a','a','a','d'],'Sym2':['a','c','b','b'],'Sym3':['a','c','b','d'],'Sym4':['b','b','b','a']},index=['Item1','Item2','Item3','Item4'])
In [32]: print(sample)
Out [32]:
      Sym1 Sym2 Sym3 Sym4
Item1    a    a    a    b
Item2    a    c    c    b
Item3    a    b    b    b
Item4    d    b    d    a

,我想找到一种优雅的方法来根据此距离矩阵获取每个Item之间的距离:

and I want to find the elegant way to get the distance between each Item according to this distance matrix:

In [34]:
DistMatrix = pd.DataFrame({'a': [0,0,0.67,1.34],'b':[0,0,0,0.67],'c':[0.67,0,0,0],'d':[1.34,0.67,0,0]},index=['a','b','c','d'])
print(DistMatrix)
Out[34]:
      a     b     c     d
a  0.00  0.00  0.67  1.34
b  0.00  0.00  0.00  0.67
c  0.67  0.00  0.00  0.00
d  1.34  0.67  0.00  0.00 

例如,将Item1Item2进行比较将比较aaab-> accb-使用距离矩阵,它将为0+0.67+0.67+0=1.34

For example comparing Item1 to Item2 would compare aaab -> accb -- using the distance matrix this would be 0+0.67+0.67+0=1.34

理想的输出:

       Item1   Item2  Item3  Item4
Item1      0    1.34     0    2.68
Item2     1.34    0      0    1.34
Item3      0      0      0    2.01
Item4     2.68  1.34   2.01    0

推荐答案

这是需要完成的工作的两倍,但是从技术上讲,它也适用于非对称距离矩阵(无论是什么意思)

this is doing twice as much work as needed, but technically works for non-symmetric distance matrices as well ( whatever that is supposed to mean )

pd.DataFrame ( { idx1: { idx2:sum( DistMatrix[ x ][ y ]
                                  for (x, y) in zip( row1, row2 ) ) 
                         for (idx2, row2) in sample.iterrows( ) } 
                 for (idx1, row1 ) in sample.iterrows( ) } )

您可以通过分段编写使其更具可读性:

you can make it more readable by writing it in pieces:

# a helper function to compute distance of two items
dist = lambda xs, ys: sum( DistMatrix[ x ][ y ] for ( x, y ) in zip( xs, ys ) )

# a second helper function to compute distances from a given item
xdist = lambda x: { idx: dist( x, y ) for (idx, y) in sample.iterrows( ) }

# the pairwise distance matrix
pd.DataFrame( { idx: xdist( x ) for ( idx, x ) in sample.iterrows( ) } )

这篇关于使用距离矩阵计算Pandas Dataframe中行之间的距离的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆