pandas `rolling_apply`和TypeError的乐趣 [英] Fun with Pandas `rolling_apply` and TypeError

查看:88
本文介绍了 pandas `rolling_apply`和TypeError的乐趣的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我真的在和熊猫rolling_apply function挣扎.我正在尝试将过滤器应用于如下所示的一些时间序列数据,并为离群值创建新的序列.当值是异常值时,我希望该值返回True.

I'm really struggling with the Pandas rolling_apply function. I'm trying to apply a filter to some time series data like below and make a new series for outliers. I want the value to return True when the value is an outlier.

ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))

window, alpha, gamma = 60, .05, .03

def trim_moments(arr, alpha):
    np.sort(arr)
    n = len(arr)
    k = int(round(n*float(alpha))/2)
    return np.mean(arr[k+1:n-k]), np.std(arr[k+1:n-k])

# First function that tests whether criteria is met.
def bg_test(arr,alpha,gamma):
    local_mean, local_std = trim_moments(arr, alpha)
    return np.abs(arr - local_mean) < 3 * local_std + gamma

这是我运行的功能

outliers = pd.rolling_apply(ts, window, bg_test, args=(alpha,gamma))

返回错误:

TypeError: only length-1 arrays can be converted to Python scalars

我的疑难解答表明问题出在布尔return语句中.当我简化函数并使用np.mean/std而不是我自己的函数时,我不断收到类似的错误.似乎以前的TypeError问题是由于对Numpy数组执行了非矢量化操作引起的,但这在这里似乎不是问题.

My troubleshooting indicates that the problem lies in the boolean return statement. I keep getting the similar error when I simplify the function and use np.mean/std rather than my own functions. It seems like previous issues with TypeError were due to performing non-vectorized operations on Numpy Arrays but this doesn't seem to be the issue here.

我在做什么错了?

推荐答案

这不是一条有用的消息,但我相信该错误正在发生,因为rolling_apply当前期望使用类似类型的返回数组(甚至可能需要浮点数) .但是,如果将三个操作(均值,标准,离群值逻辑)分解为多个步骤,则应该可以正常运行.

It's less than a helpful message, but I believe the error is happening because rolling_apply currently expects a like typed return array (may even have to be float). But, if you break your three operations (mean, std, outlier logic) into steps, it should work ok.

ts.name = 'value'

df = pd.DataFrame(ts)

def trimmed_apply(arr, alpha, f):
    np.sort(arr)
    n = len(arr)
    k = int(round(n*float(alpha))/2)
    return f(arr[k+1:n-k])

df['trimmed_mean'] = pd.rolling_apply(df['value'], window, trimmed_apply, args=(alpha, np.mean))
df['trimmed_std'] = pd.rolling_apply(df['value'], window, trimmed_apply, args=(alpha, np.std))

df['outlier'] = np.abs(arr - df['trimmed_mean']) < 3 *  df['trimmed_std'] + gamma

这篇关于 pandas `rolling_apply`和TypeError的乐趣的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆