Python:计算 pandas 系列中值的累积出现 [英] Python: Counting cumulative occurrences of values in a pandas series
本文介绍了Python:计算 pandas 系列中值的累积出现的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个看起来像这样的DataFrame:
I have a DataFrame that looks like this:
fruit
0 orange
1 orange
2 orange
3 pear
4 orange
5 apple
6 apple
7 pear
8 pear
9 orange
我想添加一列来统计每个值(即
I want to add a column that counts the cumulative occurrences of each value, i.e.
fruit cum_count
0 orange 1
1 orange 2
2 orange 3
3 pear 1
4 orange 4
5 apple 1
6 apple 2
7 pear 2
8 pear 3
9 orange 5
此刻我正在这样做:
df['cum_count'] = [(df.fruit[0:i+1] == x).sum() for i, x in df.fruit.iteritems()]
...这对于10行来说很好,但是当我尝试对几百万行执行相同的操作时,这会花费很长时间.有没有更有效的方法可以做到这一点?
... which is fine for 10 rows, but takes a really long time when I'm trying to do the same thing with a few million rows. Is there a more efficient way to do this?
推荐答案
You could use groupby
and cumcount
:
df['cum_count'] = df.groupby('fruit').cumcount() + 1
In [16]: df
Out[16]:
fruit cum_count
0 orange 1
1 orange 2
2 orange 3
3 pear 1
4 orange 4
5 apple 1
6 apple 2
7 pear 2
8 pear 3
9 orange 5
定时
In [8]: %timeit [(df.fruit[0:i+1] == x).sum() for i, x in df.fruit.iteritems()]
100 loops, best of 3: 3.76 ms per loop
In [9]: %timeit df.groupby('fruit').cumcount() + 1
1000 loops, best of 3: 926 µs per loop
所以它快了4倍.
这篇关于Python:计算 pandas 系列中值的累积出现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文