pykalman 多元回归? [英] Multiple regression with pykalman?

查看:56
本文介绍了pykalman 多元回归?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种使用 pykalman 从 1 到 N 回归器来概括回归的方法.我们一开始不会理会在线回归——我只想要一个玩具示例来为 2 个回归器而不是 1 个回归器设置 卡尔曼滤波器,即 Y = c1 * x1 + c2 * x2 + const.

对于单个回归器的情况,以下代码有效.我的问题是如何更改过滤器设置,使其适用于两个回归量:

 将 matplotlib.pyplot 导入为 plt将 numpy 导入为 np将熊猫导入为 pd从 pykalman 导入 KalmanFilter如果 __name__ == __main__":file_name = '\KalmanExample.txt'df = pd.read_csv(file_name, index_col = 0)价格 = df[['ETF', 'ASSET_1']] #, 'ASSET_2']]增量 = 1e-5trans_cov = delta/(1 - delta) * np.eye(2)obs_mat = np.vstack( [价格['ETF'],np.ones(prices['ETF'].shape)]).T[:, np.newaxis]kf = 卡尔曼滤波器(n_dim_obs=1,n_dim_state=2,initial_state_mean=np.zeros(2),initial_state_covariance=np.ones((2, 2)),transition_matrices=np.eye(2),观察矩阵=obs_mat,观察_协方差=1.0,transition_covariance=trans_cov)state_means, state_covs = kf.filter(prices['ASSET_1'].values)# 绘制斜率和截距...pd.DataFrame(字典(斜率=state_means[:, 0],拦截=state_means[:, 1]), index=prices.index).plot(subplots=True)plt.show()

示例文件 KalmanExample.txt 包含以下数据:

日期,ETF,ASSET_1,ASSET_22007-01-02,176.5,136.5,141.02007-01-03,169.5,115.5,143.252007-01-04,160.5,111.75,143.52007-01-05,160.5,112.25,143.252007-01-08,161.0,112.0,142.52007-01-09,155.5,110.5,141.252007-01-10,156.5,112.75,141.252007-01-11,162.0,118.5,142.752007-01-12,161.5,117.0,142.52007-01-15,160.0,118.75,146.752007-01-16,156.5,119.5,146.752007-01-17,155.0,120.5,145.752007-01-18,154.5,124.5,144.02007-01-19,155.5,126.0,142.752007-01-22,157.5,124.5,142.52007-01-23,161.5,124.25,141.752007-01-24,164.5,125.25,142.752007-01-25,164.0,126.5,143.02007-01-26,161.5,128.5,143.02007-01-29,161.5,128.5,140.02007-01-30,161.5,129.75,139.252007-01-31,161.5,131.5,137.52007-02-01,164.0,130.0,137.02007-02-02,156.5,132.0,128.752007-02-05,156.0,131.5,132.02007-02-06,159.0,131.25,130.252007-02-07,159.5,136.25,131.52007-02-08,153.5,136.0,129.52007-02-09,154.5,138.75,128.52007-02-12,151.0,136.75,126.02007-02-13,151.5,139.5,126.752007-02-14,155.0,169.0,129.752007-02-15,153.0,169.5,129.752007-02-16,149.75,166.5,128.02007-02-19,150.0,168.5,130.0

单回归器案例提供以下输出,对于双回归器案例,我想要第二个斜率"图表示 C2.

解决方案

已编辑答案以反映我对问题的修订理解.

如果我理解正确,您希望将可观察输出变量 Y = ETF 建模为两个可观察值的线性组合;ASSET_1、ASSET_2.

这个回归的系数被视为系统状态,即ETF = x1*ASSET_1 + x2*ASSET_2 + x3,其中x1x2 分别是系数资产 1 和 2,x3 是截距.假设这些系数缓慢演变.

下面给出了实现这一点的代码,请注意,这只是扩展了现有示例以增加一个回归器.

另请注意,您可以通过使用 delta 参数获得完全不同的结果.如果将其设置得很大(远离零),那么系数将变化得更快,并且回归的重建将接近完美.如果它设置得很小(非常接近于零),那么系数将演化得更慢,回归的重建也将不那么完美.您可能想要研究期望最大化算法 -

I'm looking for a way to generalize regression using pykalman from 1 to N regressors. We will not bother about online regression initially - I just want a toy example to set up the Kalman filter for 2 regressors instead of 1, i.e. Y = c1 * x1 + c2 * x2 + const.

For the single regressor case, the following code works. My question is how to change the filter setup so it works for two regressors:

    import matplotlib.pyplot as plt
    import numpy as np
    import pandas as pd
    from pykalman import KalmanFilter

    if __name__ == "__main__":
        file_name = '<path>\KalmanExample.txt'
        df = pd.read_csv(file_name, index_col = 0)
        prices = df[['ETF', 'ASSET_1']] #, 'ASSET_2']]
    
        delta = 1e-5
        trans_cov = delta / (1 - delta) * np.eye(2)
        obs_mat = np.vstack( [prices['ETF'], 
                            np.ones(prices['ETF'].shape)]).T[:, np.newaxis]
    
        kf = KalmanFilter(
            n_dim_obs=1,
            n_dim_state=2,
            initial_state_mean=np.zeros(2),
            initial_state_covariance=np.ones((2, 2)),
            transition_matrices=np.eye(2),
            observation_matrices=obs_mat,
            observation_covariance=1.0,
            transition_covariance=trans_cov
        )
    
        state_means, state_covs = kf.filter(prices['ASSET_1'].values)
    
        # Draw slope and intercept...
        pd.DataFrame(
            dict(
                slope=state_means[:, 0],
                intercept=state_means[:, 1]
            ), index=prices.index
        ).plot(subplots=True)
        plt.show()

The example file KalmanExample.txt contains the following data:

Date,ETF,ASSET_1,ASSET_2
2007-01-02,176.5,136.5,141.0
2007-01-03,169.5,115.5,143.25
2007-01-04,160.5,111.75,143.5
2007-01-05,160.5,112.25,143.25
2007-01-08,161.0,112.0,142.5
2007-01-09,155.5,110.5,141.25
2007-01-10,156.5,112.75,141.25
2007-01-11,162.0,118.5,142.75
2007-01-12,161.5,117.0,142.5
2007-01-15,160.0,118.75,146.75
2007-01-16,156.5,119.5,146.75
2007-01-17,155.0,120.5,145.75
2007-01-18,154.5,124.5,144.0
2007-01-19,155.5,126.0,142.75
2007-01-22,157.5,124.5,142.5
2007-01-23,161.5,124.25,141.75
2007-01-24,164.5,125.25,142.75
2007-01-25,164.0,126.5,143.0
2007-01-26,161.5,128.5,143.0
2007-01-29,161.5,128.5,140.0
2007-01-30,161.5,129.75,139.25
2007-01-31,161.5,131.5,137.5
2007-02-01,164.0,130.0,137.0
2007-02-02,156.5,132.0,128.75
2007-02-05,156.0,131.5,132.0
2007-02-06,159.0,131.25,130.25
2007-02-07,159.5,136.25,131.5
2007-02-08,153.5,136.0,129.5
2007-02-09,154.5,138.75,128.5
2007-02-12,151.0,136.75,126.0
2007-02-13,151.5,139.5,126.75
2007-02-14,155.0,169.0,129.75
2007-02-15,153.0,169.5,129.75
2007-02-16,149.75,166.5,128.0
2007-02-19,150.0,168.5,130.0

The single regressor case provides the following output and for the two-regressor case I want a second "slope"-plot representing C2.

解决方案

Answer edited to reflect my revised understanding of the question.

If I understand correctly you wish to model an observable output variable Y = ETF, as a linear combination of two observable values; ASSET_1, ASSET_2.

The coefficients of this regression are to be treated as the system states, i.e. ETF = x1*ASSET_1 + x2*ASSET_2 + x3, where x1 and x2 are the coefficients assets 1 and 2 respectively, and x3 is the intercept. These coefficients are assumed to evolve slowly.

Code implementing this is given below, note that this is just extending the existing example to have one more regressor.

Note also that you can get quite different results by playing with the delta parameter. If this is set large (far from zero), then the coefficients will change more rapidly, and the reconstruction of the regressand will be near-perfect. If it is set small (very close to zero) then the coefficients will evolve more slowly and the reconstruction of the regressand will be less perfect. You might want to look into the Expectation Maximisation algorithm - supported by pykalman.

CODE:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from pykalman import KalmanFilter

if __name__ == "__main__":
    file_name = 'KalmanExample.txt'
    df = pd.read_csv(file_name, index_col = 0)
    prices = df[['ETF', 'ASSET_1', 'ASSET_2']]
    delta = 1e-3
    trans_cov = delta / (1 - delta) * np.eye(3)
    obs_mat = np.vstack( [prices['ASSET_1'], prices['ASSET_2'],  
                          np.ones(prices['ASSET_1'].shape)]).T[:, np.newaxis]
    kf = KalmanFilter(
        n_dim_obs=1,
        n_dim_state=3,
        initial_state_mean=np.zeros(3),
        initial_state_covariance=np.ones((3, 3)),
        transition_matrices=np.eye(3),
        observation_matrices=obs_mat,
        observation_covariance=1.0,
        transition_covariance=trans_cov        
    )

    # state_means, state_covs = kf.em(prices['ETF'].values).smooth(prices['ETF'].values)
    state_means, state_covs = kf.filter(prices['ETF'].values)


    # Re-construct ETF from coefficients and 'ASSET_1' and ASSET_2 values:
    ETF_est = np.array([a.dot(b) for a, b in zip(np.squeeze(obs_mat), state_means)])

    # Draw slope and intercept...
    pd.DataFrame(
        dict(
            slope1=state_means[:, 0],
            slope2=state_means[:, 1],
            intercept=state_means[:, 2],
        ), index=prices.index
    ).plot(subplots=True)
    plt.show()

    # Draw actual y, and estimated y:
    pd.DataFrame(
        dict(
            ETF_est=ETF_est,
            ETF_act=prices['ETF'].values
        ), index=prices.index
    ).plot()
    plt.show()

PLOTS:

这篇关于pykalman 多元回归?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆