基于窗口而不是计数的带有窗口的 pandas 滚动计算 [英] pandas rolling computation with window based on values instead of counts

查看:50
本文介绍了基于窗口而不是计数的带有窗口的 pandas 滚动计算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种方法来做类似pandas的各种rolling_*函数的操作,但是我希望滚动计算的窗口由一系列值定义(例如,而不是窗口中的行数.

I'm looking for a way to do something like the various rolling_* functions of pandas, but I want the window of the rolling computation to be defined by a range of values (say, a range of values of a column of the DataFrame), not by the number of rows in the window.

作为一个例子,假设我有以下数据:

As an example, suppose I have this data:

>>> print d
   RollBasis  ToRoll
0          1       1
1          1       4
2          1      -5
3          2       2
4          3      -4
5          5      -2
6          8       0
7         10     -13
8         12      -2
9         13      -5

如果执行类似rolling_sum(d, 5)的操作,则会得到一个滚动总和,其中每个窗口包含5行.但是我想要的是一个滚动总和,其中每个窗口都包含RollBasis值的特定范围.也就是说,我希望能够执行类似d.roll_by(sum, 'RollBasis', 5)的操作,并得到一个结果,其中第一个窗口包含RollBasis在1到5之间的所有行,然后第二个窗口包含其RollBasis的所有行.在2到6之间,则第三个窗口包含RollBasis在3到7之间的所有行,依此类推.这些窗口的行数不相等,但是在每个窗口中选择的RollBasis值范围将是相同的.所以输出应该像这样:

If I do something like rolling_sum(d, 5), I get a rolling sum in which each window contains 5 rows. But what I want is a rolling sum in which each window contains a certain range of values of RollBasis. That is, I'd like to be able to do something like d.roll_by(sum, 'RollBasis', 5), and get a result where the first window contains all rows whose RollBasis is between 1 and 5, then the second window contains all rows whose RollBasis is between 2 and 6, then the third window contains all rows whose RollBasis is between 3 and 7, etc. The windows will not have equal numbers of rows, but the range of RollBasis values selected in each window will be the same. So the output should be like:

>>> d.roll_by(sum, 'RollBasis', 5)
    1    -4    # sum of elements with 1 <= Rollbasis <= 5
    2    -4    # sum of elements with 2 <= Rollbasis <= 6
    3    -6    # sum of elements with 3 <= Rollbasis <= 7
    4    -2    # sum of elements with 4 <= Rollbasis <= 8
    # etc.

我不能对groupby执行此操作,因为groupby始终会产生不相交的组.我不能通过滚动功能来做到这一点,因为它们的窗口总是按行数而不是值滚动.那我该怎么办呢?

I can't do this with groupby, because groupby always produces disjoint groups. I can't do it with the rolling functions, because their windows always roll by number of rows, not by values. So how can I do it?

推荐答案

我认为这可以满足您的要求:

I think this does what you want:

In [1]: df
Out[1]:
   RollBasis  ToRoll
0          1       1
1          1       4
2          1      -5
3          2       2
4          3      -4
5          5      -2
6          8       0
7         10     -13
8         12      -2
9         13      -5

In [2]: def f(x):
   ...:     ser = df.ToRoll[(df.RollBasis >= x) & (df.RollBasis < x+5)]
   ...:     return ser.sum()

上面的函数采用一个值(在这种情况下为RollBasis),然后根据该值对数据框列ToRoll进行索引.返回的系列由满足RollBasis + 5准则的ToRoll值组成.最后,对该系列求和并返回.

The above function takes a value, in this case RollBasis, and then indexes the data frame column ToRoll based on that value. The returned series consists of ToRoll values that meet the RollBasis + 5 criterion. Finally, that series is summed and returned.

In [3]: df['Rolled'] = df.RollBasis.apply(f)

In [4]: df
Out[4]:
   RollBasis  ToRoll  Rolled
0          1       1      -4
1          1       4      -4
2          1      -5      -4
3          2       2      -4
4          3      -4      -6
5          5      -2      -2
6          8       0     -15
7         10     -13     -20
8         12      -2      -7
9         13      -5      -5

玩具示例DataFrame的代码,以防其他人想要尝试:

Code for the toy example DataFrame in case someone else wants to try:

In [1]: from pandas import *

In [2]: import io

In [3]: text = """\
   ...:    RollBasis  ToRoll
   ...: 0          1       1
   ...: 1          1       4
   ...: 2          1      -5
   ...: 3          2       2
   ...: 4          3      -4
   ...: 5          5      -2
   ...: 6          8       0
   ...: 7         10     -13
   ...: 8         12      -2
   ...: 9         13      -5
   ...: """

In [4]: df = read_csv(io.BytesIO(text), header=0, index_col=0, sep='\s+')

这篇关于基于窗口而不是计数的带有窗口的 pandas 滚动计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆