如何使用 NumPy/SciPy 计算移动/运行/滚动任意函数(例如峰度和偏度) [英] How to calculate moving / running / rolling arbitrary function (e.g. kurtosis & skewness) using NumPy / SciPy

查看:64
本文介绍了如何使用 NumPy/SciPy 计算移动/运行/滚动任意函数(例如峰度和偏度)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理时间序列数据.为了从数据中获取特征,我必须计算移动均值、中值、众数、斜率、峰度、偏度等.我熟悉 scipy.stat,它提供了一种简单的方法来计算这些数量以进行直接计算.但是对于移动/运行部分,我已经探索了整个互联网,却一无所获.

令人惊讶的是,使用 numpy 可以很容易地计算出移动的平均值、中值和众数.不幸的是,没有用于计算峰度和偏度的内置函数.如果有人可以提供帮助,如何使用 scipy 计算移动峰度和偏度?非常感谢

解决方案

一般方法 flyingcircus.extra.running_apply()flyingcircus.extra.rolling_apply_nd()flyingcircus 慢几个数量级.extra.movi​​ng_apply(),第一个是大约.比第二个快一个数量级.这显示了对加权的普遍性或支持的速度价格.

上述图是使用来自此处的脚本和以下代码获得的:

将 scipy 导入为 sp将飞行马戏团导入为 fc导入 scipy.stats窗口 = 4FUNC = sp.stats.kurtosisdef my_rolling_apply_nd(arr, window=WINDOW, func=FUNC):返回 fc.extra.rolling_apply_nd(arr, window, func=FUNC)def my_moving_apply(arr, window=WINDOW, func=FUNC):返回 fc.extra.movi​​ng_apply(arr, window, func)def my_running_apply(arr, window=WINDOW, func=FUNC):返回 fc.extra.running_apply(arr, window, func)def equal_output(a, b):返回 np.all(np.isclose(a, b))输入尺寸 = (5, 10, 50, 100, 500, 1000, 5000, 10000, 50000, 100000)funcs = my_rolling_apply_nd、my_moving_apply、my_running_apply运行时、输入大小、标签、结果 = 基准(函数,gen_input=np.random.random,equal_output=equal_output,input_sizes=input_sizes)plot_benchmarks(运行时,input_sizes,标签,单位='s')plot_benchmarks(运行时,input_sizes,标签,units='ms',zoom_fastest=8)

I am working on the time-series data. To get features from data I have to calculate moving mean, median, mode, slop, kurtosis, skewness etc. I am familiar with scipy.stat which provides an easy way to calculate these quantities for straight calculation. But for the moving/running part, I have explored the whole internet and got nothing.

Surprisingly moving mean, median and mode are very easy to calculate with numpy. Unfortunately, there is no built-in function for calculating kurtosis and skewness. If someone can help, how to calculate moving kurtosis and skewness with scipy? Many thanks

解决方案

Pandas offers a DataFrame.rolling() method which can be used, in combination with its Rolling.apply() method (i.e. df.rolling().apply()) to apply an arbitrary function to the specified rolling window.


If you are looking for NumPy-based solution, you could use FlyingCircus (disclaimer: I am the main author of it).

There, you could find the following:

  1. flyingcircus.extra.running_apply(): can apply any function to a 1D array and supports weights, but it is slow;
  2. flyingcircus.extra.moving_apply(): can apply any function supporting a axis: int parameter to a 1D array and supports weights, and it is fast (but memory-hungry);
  3. flyingcircus.extra.rolling_apply_nd(): can apply any function supporting a axis: int|Sequence[int] parameter to any ND array and it is fast (and memory-efficient), but it does not support weights.

Based on your requirements, I would suggest to use rolling_apply_nd(), e.g.:

import numpy as np
import scipy as sp
import flyingcircus as fc

import scipy.stats


NUM = 30
arr = np.arange(NUM)

window = 4
new_arr = fc.extra.rolling_apply_nd(arr, window, func=sp.stats.kurtosis)
print(new_arr)
# [-1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36
#  -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36
#  -1.36 -1.36 -1.36]

Of course, feel free to inspect the source code, it is open source (GPL).


EDIT

Just to get a feeling of the kind of speed we are talking about, these are the benchmarks for the solutions implemented in FlyingCircus:

The general approach flyingcircus.extra.running_apply() is a couple of orders of magnitude slower than either flyingcircus.extra.rolling_apply_nd() or flyingcircus.extra.moving_apply(), with the first being approx. one order of magnitude faster than the second. This shows the speed price for generality or support for weighting.

The above plots were obtained using the scripts from here and the following code:

import scipy as sp
import flyingcircus as fc
import scipy.stats


WINDOW = 4
FUNC = sp.stats.kurtosis


def my_rolling_apply_nd(arr, window=WINDOW, func=FUNC):
    return fc.extra.rolling_apply_nd(arr, window, func=FUNC)


def my_moving_apply(arr, window=WINDOW, func=FUNC):
    return fc.extra.moving_apply(arr, window, func)


def my_running_apply(arr, window=WINDOW, func=FUNC):
    return fc.extra.running_apply(arr, window, func)


def equal_output(a, b):
    return np.all(np.isclose(a, b))


input_sizes = (5, 10, 50, 100, 500, 1000, 5000, 10000, 50000, 100000)
funcs = my_rolling_apply_nd, my_moving_apply, my_running_apply

runtimes, input_sizes, labels, results = benchmark(
    funcs, gen_input=np.random.random, equal_output=equal_output,
    input_sizes=input_sizes)

plot_benchmarks(runtimes, input_sizes, labels, units='s')
plot_benchmarks(runtimes, input_sizes, labels, units='ms', zoom_fastest=8)

这篇关于如何使用 NumPy/SciPy 计算移动/运行/滚动任意函数(例如峰度和偏度)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆