组内的条件计数 [英] Conditional counting within groups

查看：57 发布时间：2021/5/13 19:50:09 python python-3.x pandas grouping

本文介绍了组内的条件计数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想在 groupby 之后进行条件计数；例如，按 A 列的值分组，然后在每个组中计算值 5 在 B 列中出现的频率.

I wanted to do conditional counting after groupby; for example, group by values of column A, and then count within each group how often value 5 appears in column B.

如果我对整个 DataFrame 执行此操作，则只是 len(df [df ['f'[B'] == 5]).所以我希望我可以做 df.groupby('A')[df ['B'] == 5] .size().但是我想布尔索引在 GroupBy 对象中不起作用.

If I was doing this for the entire DataFrame, it's just len(df[df['B']==5]). So I hoped I could do df.groupby('A')[df['B']==5].size(). But I guess boolean indexing doesn't work within GroupBy objects.

示例:

import pandas as pd
df = pd.DataFrame({'A': [0, 4, 0, 4, 4, 6], 'B': [5, 10, 10, 5, 5, 10]})
groups = df.groupby('A')
# some more code
# in the end, I want to get pd.Series({0: 1, 1: 2, 6: 0})

推荐答案

选择 B 等于5的所有行，然后应用 groupby/size :

Select all rows where B equals 5, and then apply groupby/size:

In [43]: df.loc[df['B']==5].groupby('A').size()
Out[43]: 
A
0    1
4    2
dtype: int64

或者，您可以将 groupby/agg 与自定义功能一起使用:

Alternatively, you could use groupby/agg with a custom function:

In [44]: df.groupby('A')['B'].agg(lambda ser: (ser==5).sum())
Out[44]: 
A
0    1
4    2
Name: B, dtype: int64

请注意，一般而言，将 agg 与自定义功能一起使用会比将 groupby 与内置方法(如 size )一起使用慢.因此，相对于第二个选项，更喜欢第一个选项.

Note that generally speaking, using agg with a custom function will be slower than using groupby with a builtin method such as size. So prefer the first option over the second.

In [45]: %timeit df.groupby('A')['B'].agg(lambda ser: (ser==5).sum())
1000 loops, best of 3: 927 µs per loop

In [46]: %timeit df.loc[df['B']==5].groupby('A').size()
1000 loops, best of 3: 649 µs per loop

要包含大小为零的 A 值，可以重新索引结果:

To include A values where the size is zero, you could reindex the result:

import pandas as pd
df = pd.DataFrame({'A': [0, 4, 0, 4, 4, 6], 'B': [5, 10, 10, 5, 5, 10]})
result = df.loc[df['B'] == 5].groupby('A').size()
result = result.reindex(df['A'].unique())

收益

A
0    1.0
4    2.0
6    NaN
dtype: float64

这篇关于组内的条件计数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

组内的条件计数 [英] Conditional counting within groups

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

组内的条件计数 [英] Conditional counting within groups

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭