Pandas和groupby计算两个不同列中的匹配数 [英] Pandas and groupby count the number of matches in two different columns
问题描述
我想计算出大熊猫数据框中的groupby之后的比赛次数.
I would like to count the number of matches after a groupby in a pandas dataframe.
claim event material1 material2
A X M1 M2
A X M2 M3
A X M3 M0
A X M4 M4
A Y M5 M5
A Y M6 M0
B Z M7 M0
B Z M8 M0
首先,我通过对索赔事件进行分组,对于这些分组中的每一个,我都希望计算物料1和物料2列之间的匹配数
First, I group by the pair claim event and for each of these groups I want to count the number of matches between the columns material1 and material 2
对于分组依据,我有grouped = df.groupby(['claim', 'event'])
,但是我不知道如何比较两个新列.
For the group by, I have grouped = df.groupby(['claim', 'event'])
but then I don't know how to compare the two new columns.
它应该返回以下数据帧:
It should return the following dataframe :
claim event matches
A X 3
A Y 1
B Z 0
您是否知道该怎么做?
推荐答案
使用 isin
用于比较列,并按汇总汇总为sum
的列进行分组,最后强制转换为int
和
Use isin
for compare columns and groupby by columns with aggregate sum
, last cast to int
and reset_index
for columns from MultiIndex
:
a = (df['material1'].isin(df['material2']))
df = a.groupby([df['claim'], df['event']]).sum().astype(int).reset_index(name='matches')
分配给新列的解决方案:
Solution with assign to new column:
df['matches'] = df['material1'].isin(df['material2']).astype(int)
df = df.groupby(['claim', 'event'])['matches'].sum().reset_index()
@Wen的解决方案,谢谢:
Solutions by @Wen, thank you:
df['matches'] = df['material1'].isin(df['material2']).astype(int)
df = df.groupby(['claim', 'event'], as_index=False)['matches'].sum()
我认为在较大的DataFrame
s中应该会更慢:
I think it should be slowier in larger DataFrame
s:
df = (df.groupby(['claim', 'event'])
.apply(lambda x : x['material1'].isin(x['material2']).astype(int).sum())
.reset_index(name='matches'))
print (df)
claim event matches
0 A X 3
1 A Y 1
2 B Z 0
这篇关于Pandas和groupby计算两个不同列中的匹配数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!