在 pandas 中创建滚动协方差矩阵 [英] Create rolling covariance matrix in pandas

查看:149
本文介绍了在 pandas 中创建滚动协方差矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图在财务数据上创建一组滚动协方差矩阵(窗口大小= 60).返回值为125x3 df.

I am trying to create a set of rolling covariance matrices on financial data (window size = 60). Returns is a 125x3 df.

import pandas as pd

roll_rets = returns.rolling(window=60)
Omega = roll_rets.cov()

Omega是一个375x3数据帧,看起来像一个多索引-即每个时间戳都有3个值.

Omega is a 375x3 data frame with what looks like a multi-index - i.e. there are 3 values for each timestamp.

我实际上希望返回的是一组66个3x3协方差矩阵(即每个周期一个),但是我不知道如何正确地对收益进行迭代来做到这一点.我想我缺少明显的东西.谢谢.

What I actually want this to return is a set of 66 3x3 covariance matrices (i.e. one for each period), but I can't work out how to iterate over returns correctly to do this. I think I'm missing something obvious. Thanks.

推荐答案

首先:MultiIndex DataFrame是一个可迭代的对象. (尝试bool(pd.DataFrame.__iter__).如果您有兴趣的话,有几个关于遍历MultiIndex DataFrame子帧的StackOverflow问题.

Firstly: a MultiIndex DataFrame is an iterable object. (Try bool(pd.DataFrame.__iter__). There are several StackOverflow questions on iterating through the sub-frames of a MultiIndex DataFrame, if you have interest.

但是直接问您的问题,这是一个决定:键是(结束)日期,每个值都是3x3的NumPy数组.

But to your question directly, here is a dict: the keys are the (end) dates, and each value is a 3x3 NumPy array.

import pandas as pd
import numpy as np

Omega = (pd.DataFrame(np.random.randn(125,3), 
                      index=pd.date_range('1/1/2010', periods=125),
                      columns=list('abc'))
         .rolling(60)
         .cov()
         .dropna()) # this will get you to 66 windows instead of 125 with NaNs

dates = Omega.index.get_level_values(0) # or just the index of your base returns
d = dict(zip(dates, [Omega.loc[date].values for date in dates]))

这有效吗?不,不是.您正在为该字典的每个值创建一个单独的NumPy数组.每个NumPy数组都有其自己的dtype,等等.现在可以说DataFrame非常适合您的目的.但是另一种解决方案是通过扩展Omega.valuesndim来创建单个NumPy数组:

Is this efficient? No, not very. You are creating a separate NumPy array for each value of the dict. Each NumPy array has its own dtype, etc. The DataFrame as it is now is arguably well-suited for your purpose. But one other solution is to create a single NumPy array by expanding the ndim of Omega.values:

Omega.values.reshape(66, 3, 3)

每个元素都是一个矩阵(同样,很容易迭代,但是丢失了DataFrame中的日期索引).

Here each element is a matrix (again, easily iterable, but loses the date indexing that you had in your DataFrame).

Omega.values.reshape(66, 3, 3)[-1] # last matrix/final date
Out[29]: 
array([[ 0.80865977, -0.06134767,  0.04522074],
       [-0.06134767,  0.67492558, -0.12337773],
       [ 0.04522074, -0.12337773,  0.72340524]])

这篇关于在 pandas 中创建滚动协方差矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆