有条件的列上的 pandas 累积总和 [英] Pandas cumulative sum on column with condition

查看:70
本文介绍了有条件的列上的 pandas 累积总和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我没有在其他地方找到答案,所以我需要问.可能是因为我不知道如何正确命名. (英语不是我的母语)

I didn't found answer elsewhere, so I need to ask. Probably because I don't know how to correctly name it. (English is not my origin language)

我的日期时间数据框很大.在这里时间很重要. df中的一列具有值[Nan,1,-1].我需要执行快速计算,以在更改值时重置累计和.

I have large datetime data frame. Time is important here. One column in df has values [Nan, 1, -1]. I need to perform quick calculation to have cumulative sum reseting when value is changing.

示例.

    Time                 sign    desire_value
2014-01-24 05:00:00      Nan     Nan 
2014-01-24 06:00:00      Nan     Nan
2014-01-24 07:00:00      Nan     Nan 
2014-01-24 08:00:00      1       1
2014-01-24 09:00:00      1       2
2014-01-24 10:00:00      1       3
2014-01-24 11:00:00      -1      1
2014-01-24 12:00:00      -1      2
2014-01-24 13:00:00      -1      3
2014-01-24 14:00:00      -1      4
2014-01-24 15:00:00      -1      5
2014-01-24 16:00:00      1       1
2014-01-24 17:00:00      1       2
2014-01-24 18:00:00      1       3
2014-01-24 19:00:00      -1      1
2014-01-24 20:00:00      -1      2  
2014-01-24 21:00:00      1       1
2014-01-24 22:00:00      1       2

我有使用功能的有效解决方案,但是效率不高.

I have working solution using function, but it is not very efficient.

    df['sign_1'] = df['sign'].shift(1)

    for index, row in df.iterrows():
        if row.sign is None:
            df.loc[line, 'desire_value'] = None
        elif row.sign == row.sign_1:
            acc += 1
            df.loc[index, 'desire_value'] = acc
        else:
            acc = 1 
            df.loc[index, 'desire_value'] = acc

我找不到任何基于数组的方法.我发现在Python中进行有效迭代的最佳方法是使用Cython,但是还有更多"Python"方法可以解决此问题吗?

I cannot find any array based approach. I found that the best way to iterate efficiently in Python is using Cython, but is there more "Python" way to solve this?

推荐答案

请参阅最后一节这是类似groupby的itertools

This is an itertools like groupby

In [86]: v = df['value'].dropna()

石斑鱼在组断点处分开; cumsum使其具有单独的组

The grouper is separated on the group breakpoints; cumsum makes it have separate groups

In [87]: grouper = (v!=v.shift()).cumsum()

In [88]: grouper
Out[88]: 
3     1
4     1
5     1
6     2
7     2
8     2
9     2
10    2
11    3
12    3
13    3
14    4
15    4
16    5
17    5
Name: value, dtype: int64

然后只是一个简单的积木

Then just a simple cumsum

In [89]: df.groupby(grouper)['value'].cumsum()
Out[89]: 
0    NaN
1    NaN
2    NaN
3      1
4      2
5      3
6     -1
7     -2
8     -3
9     -4
10    -5
11     1
12     2
13     3
14    -1
15    -2
16     1
17     2
dtype: float64

如果您确实想使用绝对值,则可以当然使用.abs().

You can certainly .abs() the above if you do in fact want the absolute values.

这篇关于有条件的列上的 pandas 累积总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆