Pandas和groupby计算两个不同列中的匹配数 [英] Pandas and groupby count the number of matches in two different columns

查看：186 发布时间：2020/5/24 2:41:07 python pandas dataframe pandas-groupby

本文介绍了Pandas和groupby计算两个不同列中的匹配数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想计算出大熊猫数据框中的groupby之后的比赛次数.

I would like to count the number of matches after a groupby in a pandas dataframe.

claim   event   material1   material2
A       X       M1          M2
A       X       M2          M3
A       X       M3          M0
A       X       M4          M4
A       Y       M5          M5
A       Y       M6          M0
B       Z       M7          M0
B       Z       M8          M0

首先，我通过对索赔事件进行分组，对于这些分组中的每一个，我都希望计算物料1和物料2列之间的匹配数

First, I group by the pair claim event and for each of these groups I want to count the number of matches between the columns material1 and material 2

对于分组依据，我有grouped = df.groupby(['claim', 'event'])，但是我不知道如何比较两个新列.

For the group by, I have grouped = df.groupby(['claim', 'event']) but then I don't know how to compare the two new columns.

它应该返回以下数据帧:

It should return the following dataframe :

claim   event   matches
A       X       3          
A       Y       1          
B       Z       0

您是否知道该怎么做?

推荐答案

使用 isin 用于比较列，并按汇总汇总为sum的列进行分组，最后强制转换为int和

Use isin for compare columns and groupby by columns with aggregate sum, last cast to int and reset_index for columns from MultiIndex:

a = (df['material1'].isin(df['material2']))
df = a.groupby([df['claim'], df['event']]).sum().astype(int).reset_index(name='matches')

分配给新列的解决方案:

Solution with assign to new column:

df['matches'] = df['material1'].isin(df['material2']).astype(int)
df = df.groupby(['claim', 'event'])['matches'].sum().reset_index()

@Wen的解决方案，谢谢:

Solutions by @Wen, thank you:

df['matches'] = df['material1'].isin(df['material2']).astype(int)
df = df.groupby(['claim', 'event'], as_index=False)['matches'].sum()

我认为在较大的DataFrame s中应该会更慢:

I think it should be slowier in larger DataFrames:

df = (df.groupby(['claim', 'event'])
                  .apply(lambda x : x['material1'].isin(x['material2']).astype(int).sum())
                  .reset_index(name='matches'))

print (df)
  claim event  matches
0     A     X        3
1     A     Y        1
2     B     Z        0

这篇关于Pandas和groupby计算两个不同列中的匹配数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Pandas和groupby计算两个不同列中的匹配数 [英] Pandas and groupby count the number of matches in two different columns

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Pandas和groupby计算两个不同列中的匹配数 [英] Pandas and groupby count the number of matches in two different columns

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭