从 pandas 到Statsmodels的OLS中不推荐使用的滚动窗口选项 [英] Deprecated rolling window option in OLS from Pandas to Statsmodels

查看:474
本文介绍了从 pandas 到Statsmodels的OLS中不推荐使用的滚动窗口选项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

正如标题所示,Pandas的ols命令中的rolling function选项在哪里迁移到statsmodels中?我似乎找不到. 熊猫告诉我,厄运正在发生:

as the title suggests, where has the rolling function option in the ols command in Pandas migrated to in statsmodels? I can't seem to find it. Pandas tells me doom is in the works:

FutureWarning: The pandas.stats.ols module is deprecated and will be removed in a future version. We refer to external packages like statsmodels, see some examples here: http://statsmodels.sourceforge.net/stable/regression.html
  model = pd.ols(y=series_1, x=mmmm, window=50)

实际上,如果您执行以下操作:

in fact, if you do something like:

import statsmodels.api as sm

model = sm.OLS(series_1, mmmm, window=50).fit()

print(model.summary())

您将获得结果(窗口不会影响代码的运行),但是您只会获得整个周期内运行的回归的参数,而不是应该运行的每个滚动周期的一系列参数

you get results (window does not impair the running of the code) but you get only the parameters of the regression run on the entire period, not the series of parameters for each of the rolling period it should be supposed to work on.

推荐答案

我创建了一个ols模块,该模块旨在模拟已弃用的熊猫MovingOLS;它是此处.

I created an ols module designed to mimic pandas' deprecated MovingOLS; it is here.

它具有三个核心类:

  • OLS:静态(单窗口)普通最小二乘回归.输出是NumPy数组
  • RollingOLS:滚动(多窗口)普通最小二乘回归.输出是高维NumPy数组.
  • PandasRollingOLS:将RollingOLS的结果包装在pandas Series&中.数据框.旨在模仿已淘汰的熊猫模块的外观.
  • OLS : static (single-window) ordinary least-squares regression. The output are NumPy arrays
  • RollingOLS : rolling (multi-window) ordinary least-squares regression. The output are higher-dimension NumPy arrays.
  • PandasRollingOLS : wraps the results of RollingOLS in pandas Series & DataFrames. Designed to mimic the look of the deprecated pandas module.

请注意,该模块是的一部分(我目前正在上载该文件到PyPi),并且需要一次包间导入.

Note that the module is part of a package (which I'm currently in the process of uploading to PyPi) and it requires one inter-package import.

上面的前两类完全在NumPy中实现,主要使用矩阵代数. RollingOLS也充分利用了广播的优势.属性在很大程度上模仿了statsmodels的OLS RegressionResultsWrapper.

The first two classes above are implemented entirely in NumPy and primarily use matrix algebra. RollingOLS takes advantage of broadcasting extensively also. Attributes largely mimic statsmodels' OLS RegressionResultsWrapper.

一个例子:

import urllib.parse
import pandas as pd
from pyfinance.ols import PandasRollingOLS

# You can also do this with pandas-datareader; here's the hard way
url = "https://fred.stlouisfed.org/graph/fredgraph.csv"

syms = {
    "TWEXBMTH" : "usd", 
    "T10Y2YM" : "term_spread", 
    "GOLDAMGBD228NLBM" : "gold",
}

params = {
    "fq": "Monthly,Monthly,Monthly",
    "id": ",".join(syms.keys()),
    "cosd": "2000-01-01",
    "coed": "2019-02-01",
}

data = pd.read_csv(
    url + "?" + urllib.parse.urlencode(params, safe=","),
    na_values={"."},
    parse_dates=["DATE"],
    index_col=0
).pct_change().dropna().rename(columns=syms)
print(data.head())
#                  usd  term_spread      gold
# DATE                                       
# 2000-02-01  0.012580    -1.409091  0.057152
# 2000-03-01 -0.000113     2.000000 -0.047034
# 2000-04-01  0.005634     0.518519 -0.023520
# 2000-05-01  0.022017    -0.097561 -0.016675
# 2000-06-01 -0.010116     0.027027  0.036599

y = data.usd
x = data.drop('usd', axis=1)

window = 12  # months
model = PandasRollingOLS(y=y, x=x, window=window)

print(model.beta.head())  # Coefficients excluding the intercept
#             term_spread      gold
# DATE                             
# 2001-01-01     0.000033 -0.054261
# 2001-02-01     0.000277 -0.188556
# 2001-03-01     0.002432 -0.294865
# 2001-04-01     0.002796 -0.334880
# 2001-05-01     0.002448 -0.241902

print(model.fstat.head())
# DATE
# 2001-01-01    0.136991
# 2001-02-01    1.233794
# 2001-03-01    3.053000
# 2001-04-01    3.997486
# 2001-05-01    3.855118
# Name: fstat, dtype: float64

print(model.rsq.head())  # R-squared
# DATE
# 2001-01-01    0.029543
# 2001-02-01    0.215179
# 2001-03-01    0.404210
# 2001-04-01    0.470432
# 2001-05-01    0.461408
# Name: rsq, dtype: float64

这篇关于从 pandas 到Statsmodels的OLS中不推荐使用的滚动窗口选项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆