在 pandas 数据帧上滚动 PCA [英] Rolling PCA on pandas dataframe

查看:54
本文介绍了在 pandas 数据帧上滚动 PCA的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道是否有人知道如何在 Pandas 数据帧上实现滚动/移动窗口 PCA.我环顾四周,在 R 和 MATLAB 中找到了实现,但没有在 Python 中找到.任何帮助将不胜感激!

I'm wondering if anyone knows of how to implement a rolling/moving window PCA on a pandas dataframe. I've looked around and found implementations in R and MATLAB but not Python. Any help would be appreciated!

这不是重复的 - 移动窗口 PCA 与整个数据帧上的 PCA 不同.如果您不明白其中的区别,请参阅 pandas.DataFrame.rolling()

This is not a duplicate - moving window PCA is not the same as PCA on the entire dataframe. Please see pandas.DataFrame.rolling() if you do not understand the difference

推荐答案

不幸的是,pandas.DataFrame.rolling() 似乎在滚动之前将 df 弄平了,所以它不能使用,因为人们可能期望翻转 df 的行并将行的窗口传递给 PCA.

Unfortunately, pandas.DataFrame.rolling() seems to flatten the df before rolling, so it cannot be used as one might expect to roll over the rows of the df and pass windows of rows to the PCA.

以下是基于滚动索引而不是行的解决方法.它可能不是很优雅,但它有效:

The following is a work-around for this based on rolling over indices instead of rows. It may not be very elegant but it works:

# Generate some data (1000 time points, 10 features)
data = np.random.random(size=(1000,10))
df = pd.DataFrame(data)

# Set the window size
window = 100

# Initialize an empty df of appropriate size for the output
df_pca = pd.DataFrame( np.zeros((data.shape[0] - window + 1, data.shape[1])) )

# Define PCA fit-transform function
# Note: Instead of attempting to return the result, 
#       it is written into the previously created output array.
def rolling_pca(window_data):
    pca = PCA()
    transf = pca.fit_transform(df.iloc[window_data])
    df_pca.iloc[int(window_data[0])] = transf[0,:]
    return True

# Create a df containing row indices for the workaround
df_idx = pd.DataFrame(np.arange(df.shape[0]))

# Use `rolling` to apply the PCA function
_ = df_idx.rolling(window).apply(rolling_pca)

# The results are now contained here:
print df_pca

快速检查发现由此产生的值与通过手动切片适当的窗口并在其上运行 PCA 计算出的控制值相同.

A quick check reveals that the values produced by this are identical to control values computed by slicing appropriate windows manually and running PCA on them.

这篇关于在 pandas 数据帧上滚动 PCA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆