pandas groupby 统计列上的字符串出现次数 [英] pandas groupby count string occurrence over column
问题描述
我想计算一个字符串在分组的 Pandas 数据框列中出现的次数.
I want to count the occurrence of a string in a grouped pandas dataframe column.
假设我有以下数据框:
catA catB scores
A X 6-4 RET
A X 6-4 6-4
A Y 6-3 RET
B Z 6-0 RET
B Z 6-1 RET
首先,我想按catA
和catB
进行分组.对于这些组中的每一个,我想计算 scores
列中 RET
的出现次数.
First, I want to group by catA
and catB
. And for each of these groups I want to count the occurrence of RET
in the scores
column.
结果应该是这样的:
catA catB RET
A X 1
A Y 1
B Z 2
按两列分组很简单:grouped = df.groupby(['catA', 'catB'])
接下来呢?
推荐答案
调用 apply
在 groupby
对象并使用矢量化str
方法 contains代码>
,使用它来过滤group
并调用count
:
Call apply
on the 'scores' column on the groupby
object and use the vectorise str
method contains
, use this to filter the group
and call count
:
In [34]:
df.groupby(['catA', 'catB'])['scores'].apply(lambda x: x[x.str.contains('RET')].count())
Out[34]:
catA catB
A X 1
Y 1
B Z 2
Name: scores, dtype: int64
要指定为列,请使用 transform
以便聚合返回一个索引与原始 df 对齐的系列:
To assign as a column use transform
so that the aggregation returns a series with it's index aligned to the original df:
In [35]:
df['count'] = df.groupby(['catA', 'catB'])['scores'].transform(lambda x: x[x.str.contains('RET')].count())
df
Out[35]:
catA catB scores count
0 A X 6-4 RET 1
1 A X 6-4 6-4 1
2 A Y 6-3 RET 1
3 B Z 6-0 RET 2
4 B Z 6-1 RET 2
这篇关于pandas groupby 统计列上的字符串出现次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!