在python pandas 中构造共现矩阵 [英] Constructing a co-occurrence matrix in python pandas
问题描述
我知道如何在 R .但是,熊猫中是否有任何函数可以将数据帧转换为包含两个同时出现的计数的nxn同时出现矩阵.
I know how to do this in R. But, is there any function in pandas that transforms a dataframe to an nxn co-occurrence matrix containing the counts of two aspects co-occurring.
例如矩阵df:
import pandas as pd
df = pd.DataFrame({'TFD' : ['AA', 'SL', 'BB', 'D0', 'Dk', 'FF'],
'Snack' : ['1', '0', '1', '1', '0', '0'],
'Trans' : ['1', '1', '1', '0', '0', '1'],
'Dop' : ['1', '0', '1', '0', '1', '1']}).set_index('TFD')
print df
>>>
Dop Snack Trans
TFD
AA 1 1 1
SL 0 0 1
BB 1 1 1
D0 0 1 0
Dk 1 0 0
FF 1 0 1
[6 rows x 3 columns]
将产生产量:
Dop Snack Trans
Dop 0 2 3
Snack 2 0 2
Trans 3 2 0
由于矩阵是在对角线上镜像的,所以我想会有一种优化代码的方法.
Since the matrix is mirrored on the diagonal I guess there would be a way to optimize code.
推荐答案
这是一个简单的线性代数,您将矩阵与其转置相乘(您的示例包含字符串,请不要忘记将它们转换为整数):
It's a simple linear algebra, you multiply matrix with its transpose (your example contains strings, don't forget to convert them to integer):
>>> df_asint = df.astype(int)
>>> coocc = df_asint.T.dot(df_asint)
>>> coocc
Dop Snack Trans
Dop 4 2 3
Snack 2 3 2
Trans 3 2 4
如果(如R答案中的那样)您想要重设对角线,则可以使用numpy的 fill_diagonal
:
if, as in R answer, you want to reset diagonal, you can use numpy's fill_diagonal
:
>>> import numpy as np
>>> np.fill_diagonal(coocc.values, 0)
>>> coocc
Dop Snack Trans
Dop 0 2 3
Snack 2 0 2
Trans 3 2 0
这篇关于在python pandas 中构造共现矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!