pandas 交叉制表和计数 [英] Pandas Crosstabulation and counting
问题描述
我正在使用Python Pandas.我有一个带有字符串的列,我想在这些列之间有一个交叉点.
I am using Python Pandas. I have got a column with a string and I would like to have the crossing between the columns.
例如,我输入了以下内容
E.g I have got the following input
1: Andi
2: Andi, Cindy
3: Thomas, Cindy
4: Cindy, Thomas
我想获得以下输出:
因此,Andi和Thomas的组合未出现在数据中,但是Cindy和Thomas出现了两次.
Hence, the combination of Andi and Thomas does not appear in the data, but Cindy and Thomas appear twice.
Andi Thomas Cindy
Andi 1 0 1
Thomas 0 1 2
Cindy 1 2 1
有人知道我该如何处理吗?那真是太好了!
Has somebody any idea how I could handle this? That would be really great!
非常感谢和问候,
安迪
推荐答案
您可以先生成虚拟列:
df['A'].str.get_dummies(', ')
Out:
Andi Cindy Thomas
0 1 0 0
1 1 1 0
2 0 1 1
3 0 1 1
并在点积中使用它:
tab = df['A'].str.get_dummies(', ')
tab.T.dot(tab)
Out:
Andi Cindy Thomas
Andi 2 1 0
Cindy 1 3 2
Thomas 0 2 2
对角线条目将为您提供每个人出现的次数.如果您需要将对角线设置为1,则有其他几种选择一个>.其中之一是 np.fill_diagonal
来自numpy.
Diagonal entries will give you the number of occurrences for each person. If you need to set the diagonals to 1, there are several alternatives. One of them is np.fill_diagonal
from numpy.
co_occurrence = tab.T.dot(tab)
np.fill_diagonal(co_occurrence.values, 1)
co_occurrence
Out:
Andi Cindy Thomas
Andi 1 1 0
Cindy 1 1 2
Thomas 0 2 1
这篇关于 pandas 交叉制表和计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!