pandas :有条件的滚动计数 [英] Pandas: conditional rolling count
问题描述
我有一个如下所示的系列:
I have a Series that looks the following:
col
0 B
1 B
2 A
3 A
4 A
5 B
这是一个时间序列,因此索引按时间排序.
It's a time series, therefore the index is ordered by time.
对于每一行,我想计算该值连续出现了多少次,即:
For each row, I'd like to count how many times the value has appeared consecutively, i.e.:
输出:
col count
0 B 1
1 B 2
2 A 1 # Value does not match previous row => reset counter to 1
3 A 2
4 A 3
5 B 1 # Value does not match previous row => reset counter to 1
我发现了 2 个相关问题,但我无法弄清楚如何将该信息写入"为 DataFrame 中的每一行的新列(如上).使用滚动应用效果不佳.
I found 2 related questions, but I can't figure out how to "write" that information as a new column in the DataFrame, for each row (as above). Using rolling_apply does not work well.
相关:
推荐答案
我认为有一个很好的方法可以将 @chrisb 和 @CodeShaman 的解决方案结合起来(正如有人指出的 CodeShamans 解决方案计算总数而不是连续值).
I think there is a nice way to combine the solution of @chrisb and @CodeShaman (As it was pointed out CodeShamans solution counts total and not consecutive values).
df['count'] = df.groupby((df['col'] != df['col'].shift(1)).cumsum()).cumcount()+1
col count
0 B 1
1 B 2
2 A 1
3 A 2
4 A 3
5 B 1
这篇关于 pandas :有条件的滚动计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!