pandas :在数据框中创建一个新的列,该列是滚动窗口的函数 [英] Pandas: create a new column in a dataframe that is a function of a rolling window

查看:79
本文介绍了 pandas :在数据框中创建一个新的列,该列是滚动窗口的函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,可以计算滚动10个周期平均值的新列 使用pandas.stats.moments.rolling_mean(ExistingColumn, 10, min_periods=10).如果少于10个周期,我会得到NaN.一世 可以对滚动中位数执行相同的操作.完美.

I have a data frame and can compute a new column of rolling 10 period means using pandas.stats.moments.rolling_mean(ExistingColumn, 10, min_periods=10). If there are fewer than 10 periods available, I get a NaN. I can do the same for rolling medians. Perfect.

我现在想计算N个周期的其他滚动函数,但是不能计算 我的生活弄清楚了如何在Pandas中使用用户定义的函数.在 特别是,我想计算滚动的10点霍奇斯雷曼均值,即 定义如下:

I'd now like to compute other rolling functions of N periods, but can't for the life of me figure out how to do use a user defined function with Pandas. In particular, I want to compute a rolling 10 point Hodges Lehman Mean, which is defined as follows:

def hodgesLehmanMean(x): 
    return 0.5 * statistics.median(x[i] + x[j] for i in range(len(x)) for j in range(i+1,len(x)))

我如何将其转换为可应用于熊猫的滚动功能 数据框,如果传递少于10个周期,则返回NaN?我是一个 熊猫新手,所以我特别感谢与 一个例子.

How can i turn this into a rolling function that can be applied to a Pandas dataframe and returns a NaN if fewer than 10 periods are passed to it? I'm a Pandas newbie, so I'd be particularly appreciative of a simple explanation with an example.

推荐答案

您可以使用

You could use pandas.rolling_apply:

import numpy as np
def hodgesLehmanMean(x): 
    return 0.5 * np.median([x[i] + x[j] 
                           for i in range(len(x)) 
                           for j in range(i+1,len(x))])

df = pd.DataFrame({'foo': np.arange(20, dtype='float')})
df['bar'] = pd.rolling_apply(df['foo'], 10, hodgesLehmanMean)
print(df)

收益

    foo   bar
0     0   NaN
1     1   NaN
2     2   NaN
3     3   NaN
4     4   NaN
5     5   NaN
6     6   NaN
7     7   NaN
8     8   NaN
9     9   4.5
10   10   5.5
11   11   6.5
12   12   7.5
13   13   8.5
14   14   9.5
15   15  10.5
16   16  11.5
17   17  12.5
18   18  13.5
19   19  14.5

hodgesLehmanMean的更快版本将是:

def hodgesLehmanMean_alt(x): 
    m = np.add.outer(x,x)
    ind = np.tril_indices(len(x), -1)
    return 0.5 * np.median(m[ind])


这里是一个健全性检查,对于长度为100的1000个随机数组,hodgesLehmanMean_alt返回与hodgesLehmanMean相同的值:


Here is a sanity-check showing hodgesLehmanMean_alt returns the same value as hodgesLehmanMean for 1000 random arrays of length 100:

In [68]: m = np.random.random((1000, 100))

In [69]: all(hodgesLehmanMean(x) == hodgesLehmanMean_alt(x) for x in m)
Out[69]: True

以下是一个基准测试,显示hodgesLehmanMean_alt快约8倍:

Here is a benchmark showing hodgesLehmanMean_alt is about 8x faster:

In [80]: x = np.random.random(5000)

In [81]: %timeit hodgesLehmanMean(x)
1 loops, best of 3: 3.99 s per loop

In [82]: %timeit hodgesLehmanMean_alt(x)
1 loops, best of 3: 463 ms per loop

这篇关于 pandas :在数据框中创建一个新的列,该列是滚动窗口的函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆