Python:计算 pandas 系列中值的累积出现 [英] Python: Counting cumulative occurrences of values in a pandas series

查看:52
本文介绍了Python:计算 pandas 系列中值的累积出现的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的DataFrame:

I have a DataFrame that looks like this:

    fruit
0  orange
1  orange
2  orange
3    pear
4  orange
5   apple
6   apple
7    pear
8    pear
9  orange

我想添加一列来统计每个值(即

I want to add a column that counts the cumulative occurrences of each value, i.e.

    fruit  cum_count
0  orange          1
1  orange          2
2  orange          3
3    pear          1
4  orange          4
5   apple          1
6   apple          2
7    pear          2
8    pear          3
9  orange          5

此刻我正在这样做:

df['cum_count'] = [(df.fruit[0:i+1] == x).sum() for i, x in df.fruit.iteritems()]

...这对于10行来说很好,但是当我尝试对几百万行执行相同的操作时,这会花费很长时间.有没有更有效的方法可以做到这一点?

... which is fine for 10 rows, but takes a really long time when I'm trying to do the same thing with a few million rows. Is there a more efficient way to do this?

推荐答案

您可以使用 >

You could use groupby and cumcount:

df['cum_count'] = df.groupby('fruit').cumcount() + 1

In [16]: df
Out[16]:
    fruit  cum_count
0  orange          1
1  orange          2
2  orange          3
3    pear          1
4  orange          4
5   apple          1
6   apple          2
7    pear          2
8    pear          3
9  orange          5

定时

In [8]: %timeit [(df.fruit[0:i+1] == x).sum() for i, x in df.fruit.iteritems()]
100 loops, best of 3: 3.76 ms per loop

In [9]: %timeit df.groupby('fruit').cumcount() + 1
1000 loops, best of 3: 926 µs per loop

所以它快了4倍.

这篇关于Python:计算 pandas 系列中值的累积出现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆