从 pandas 数据帧生成相似度矩阵 [英] Generating a similarity matrix from pandas dataframe

查看：140 发布时间：2020/5/24 4:19:42 python pandas dataframe similarity

本文介绍了从 pandas 数据帧生成相似度矩阵的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个df

id    val1    val2    val3
100    aa      bb      cc
200    bb      cc      0
300    aa      cc      0
400    bb      aa      cc

由此，我必须生成一个df，如下所示:

From this I have to generate a df, something like this:

     100  200  300  400                    
100    3    2    2    3
200    2    2    1    2
300    2    1    2    2
400    3    2    2    3

说明: id 100 包含aa,bb,cc， 200 包含bb,cc,0

有2个相似的值.

因此，在我的最终矩阵中，应插入 index-100 和第200列， 2 的交集.

Therefore in my final matrix, the intersection cell for index-100 and column 200, 2 should be inserted.

类似地，对于 id 200-，值是bb,cc,0，对于 id 300 -aa,cc,0

Similarly for id 200- values are bb,cc,0 and that for id 300 - aa,cc,0

这里的相似度是 1 ，因此在我的最终矩阵中对应于 200(索引)-300(列)的单元格应插入 1.

Here the similarity is 1, therefore in my final matrix the cell corresponding to 200(index)-300(column) should be inserted with 1.

推荐答案

一些预处理.首先，从set_index到id并摆脱0，我们不需要它们.

Some preprocessing. First, set_index to id and get rid of 0s, we don't need them.

df = df.set_index('id').replace('0', np.nan)

df    
    val1 val2 val3
id                
100   aa   bb   cc
200   bb   cc  NaN
300   aa   cc  NaN
400   bb   aa   cc

现在，结合使用pd.get_dummies和df.dot并获得相似度得分.

Now, use a combination of pd.get_dummies and df.dot and get your similarity scores.

x = pd.get_dummies(df)
y = x.groupby(x.columns.str.split('_').str[1], axis=1).sum()    
y.dot(y.T)

     100  200  300  400  
id                   
100    3    2    2    3
200    2    2    1    2
300    2    1    2    2
400    3    2    2    3

这篇关于从 pandas 数据帧生成相似度矩阵的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从 pandas 数据帧生成相似度矩阵 [英] Generating a similarity matrix from pandas dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

从 pandas 数据帧生成相似度矩阵 [英] Generating a similarity matrix from pandas dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭