pandas 交叉制表和计数 [英] Pandas Crosstabulation and counting

查看:84
本文介绍了 pandas 交叉制表和计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Python Pandas.我有一个带有字符串的列,我想在这些列之间有一个交叉点.

I am using Python Pandas. I have got a column with a string and I would like to have the crossing between the columns.

例如,我输入了以下内容

E.g I have got the following input

1: Andi
2: Andi, Cindy
3: Thomas, Cindy
4: Cindy, Thomas

我想获得以下输出:

因此,Andi和Thomas的组合未出现在数据中,但是Cindy和Thomas出现了两次.

Hence, the combination of Andi and Thomas does not appear in the data, but Cindy and Thomas appear twice.

          Andi  Thomas  Cindy
    Andi    1     0      1
    Thomas  0     1      2
    Cindy   1     2      1

有人知道我该如何处理吗?那真是太好了!

Has somebody any idea how I could handle this? That would be really great!

非常感谢和问候,

安迪

推荐答案

您可以先生成虚拟列:

df['A'].str.get_dummies(', ')
Out: 
   Andi  Cindy  Thomas
0     1      0       0
1     1      1       0
2     0      1       1
3     0      1       1

并在点积中使用它:

tab = df['A'].str.get_dummies(', ')

tab.T.dot(tab)
Out: 
        Andi  Cindy  Thomas
Andi       2      1       0
Cindy      1      3       2
Thomas     0      2       2

对角线条目将为您提供每个人出现的次数.如果您需要将对角线设置为1,则有其他几种选择.其中之一是 np.fill_diagonal 来自numpy.

Diagonal entries will give you the number of occurrences for each person. If you need to set the diagonals to 1, there are several alternatives. One of them is np.fill_diagonal from numpy.

co_occurrence = tab.T.dot(tab)    
np.fill_diagonal(co_occurrence.values, 1)    
co_occurrence
Out: 
        Andi  Cindy  Thomas
Andi       1      1       0
Cindy      1      1       2
Thomas     0      2       1

这篇关于 pandas 交叉制表和计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆