在有条件的增量下对 pandas 数据框使用cumcount [英] Use cumcount on pandas dataframe with a conditional increment
问题描述
考虑数据框
df = pd.DataFrame(
[
['A', 1],
['A', 1],
['B', 1],
['B', 0],
['A', 0],
['A', 1],
['B', 1]
], columns = ['key', 'cond'])
我想找到每个 key
的累积(运行)计数(从1开始),我们仅在组中先前的值具有 cond == 1 时递增代码>.将其附加到上述数据框后,即可得到
I want to find a cumulative (running) count (starting at 1) for each key
, where we only increment if the previous value in the group had cond == 1
. When appended to the above dataframe this would give
df_result = pd.DataFrame(
[
['A', 1, 1],
['A', 1, 2],
['B', 1, 1],
['B', 0, 2],
['A', 0, 3],
['A', 1, 3],
['B', 1, 2]
], columns = ['key', 'cond'])
请注意,基本上每个 key
组中最后一行的 cond
值都无效.
Note that essentially the cond
values of the last rows in each key
group have no effect.
只需做一个简单的 group
和 cumcount
df.groupby('key').cumcount()
当然不考虑上一个元素的 cond
值.如何考虑到这一点?
of course doesn't account for the cond
value of the previous element. How can I take this into account?
编辑
由于以下某些解决方案在某些极端情况下不起作用,因此我将提供更全面的数据框架进行测试.
As some of the solutions below don't work on some edge cases, I will give a more comprehensive dataframe for testing.
df = pd.DataFrame(
[
['A', 0],
['A', 1],
['A', 1],
['B', 1],
['B', 0],
['A', 0],
['A', 1],
['B', 1],
['B', 0]
], columns = ['key', 'cond'])
在添加真实结果时应该给出的
which when appending the true result should give
df_result = pd.DataFrame(
[
['A', 0, 1],
['A', 1, 1],
['A', 1, 2],
['B', 1, 1],
['B', 0, 2],
['A', 0, 3],
['A', 1, 3],
['B', 1, 2],
['B', 0, 3]
], columns = ['key', 'cond'])
推荐答案
df['new'] = df.groupby('key').cond.apply(
lambda x: x.shift().fillna(1).cumsum()
).astype(int)
df
key cond new
0 A 1 1
1 A 1 2
2 B 1 1
3 B 0 2
4 A 0 3
5 A 1 3
6 B 1 2
这篇关于在有条件的增量下对 pandas 数据框使用cumcount的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!