在python pandas 中构造共现矩阵 [英] Constructing a co-occurrence matrix in python pandas

查看:88
本文介绍了在python pandas 中构造共现矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道如何在 R .但是,熊猫中是否有任何函数可以将数据帧转换为包含两个同时出现的计数的nxn同时出现矩阵.

I know how to do this in R. But, is there any function in pandas that transforms a dataframe to an nxn co-occurrence matrix containing the counts of two aspects co-occurring.

例如矩阵df:

import pandas as pd

df = pd.DataFrame({'TFD' : ['AA', 'SL', 'BB', 'D0', 'Dk', 'FF'],
                    'Snack' : ['1', '0', '1', '1', '0', '0'],
                    'Trans' : ['1', '1', '1', '0', '0', '1'],
                    'Dop' : ['1', '0', '1', '0', '1', '1']}).set_index('TFD')

print df

>>> 
    Dop Snack Trans
TFD                
AA    1     1     1
SL    0     0     1
BB    1     1     1
D0    0     1     0
Dk    1     0     0
FF    1     0     1

[6 rows x 3 columns]

将产生产量:

    Dop Snack Trans

Dop   0     2     3
Snack 2     0     2
Trans 3     2     0

由于矩阵是在对角线上镜像的,所以我想会有一种优化代码的方法.

Since the matrix is mirrored on the diagonal I guess there would be a way to optimize code.

推荐答案

这是一个简单的线性代数,您将矩阵与其转置相乘(您的示例包含字符串,请不要忘记将它们转换为整数):

It's a simple linear algebra, you multiply matrix with its transpose (your example contains strings, don't forget to convert them to integer):

>>> df_asint = df.astype(int)
>>> coocc = df_asint.T.dot(df_asint)
>>> coocc
       Dop  Snack  Trans
Dop      4      2      3
Snack    2      3      2
Trans    3      2      4

如果(如R答案中的那样)您想要重设对角线,则可以使用numpy的 fill_diagonal :

if, as in R answer, you want to reset diagonal, you can use numpy's fill_diagonal:

>>> import numpy as np
>>> np.fill_diagonal(coocc.values, 0)
>>> coocc
       Dop  Snack  Trans
Dop      0      2      3
Snack    2      0      2
Trans    3      2      0

这篇关于在python pandas 中构造共现矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆