在 pandas 数据帧上滚动 PCA [英] Rolling PCA on pandas dataframe
问题描述
我想知道是否有人知道如何在 Pandas 数据帧上实现滚动/移动窗口 PCA.我环顾四周,在 R 和 MATLAB 中找到了实现,但没有在 Python 中找到.任何帮助将不胜感激!
I'm wondering if anyone knows of how to implement a rolling/moving window PCA on a pandas dataframe. I've looked around and found implementations in R and MATLAB but not Python. Any help would be appreciated!
这不是重复的 - 移动窗口 PCA 与整个数据帧上的 PCA 不同.如果您不明白其中的区别,请参阅 pandas.DataFrame.rolling()
This is not a duplicate - moving window PCA is not the same as PCA on the entire dataframe. Please see pandas.DataFrame.rolling() if you do not understand the difference
推荐答案
不幸的是,pandas.DataFrame.rolling()
似乎在滚动之前将 df
弄平了,所以它不能使用,因为人们可能期望翻转 df
的行并将行的窗口传递给 PCA.
Unfortunately, pandas.DataFrame.rolling()
seems to flatten the df
before rolling, so it cannot be used as one might expect to roll over the rows of the df
and pass windows of rows to the PCA.
以下是基于滚动索引而不是行的解决方法.它可能不是很优雅,但它有效:
The following is a work-around for this based on rolling over indices instead of rows. It may not be very elegant but it works:
# Generate some data (1000 time points, 10 features)
data = np.random.random(size=(1000,10))
df = pd.DataFrame(data)
# Set the window size
window = 100
# Initialize an empty df of appropriate size for the output
df_pca = pd.DataFrame( np.zeros((data.shape[0] - window + 1, data.shape[1])) )
# Define PCA fit-transform function
# Note: Instead of attempting to return the result,
# it is written into the previously created output array.
def rolling_pca(window_data):
pca = PCA()
transf = pca.fit_transform(df.iloc[window_data])
df_pca.iloc[int(window_data[0])] = transf[0,:]
return True
# Create a df containing row indices for the workaround
df_idx = pd.DataFrame(np.arange(df.shape[0]))
# Use `rolling` to apply the PCA function
_ = df_idx.rolling(window).apply(rolling_pca)
# The results are now contained here:
print df_pca
快速检查发现由此产生的值与通过手动切片适当的窗口并在其上运行 PCA 计算出的控制值相同.
A quick check reveals that the values produced by this are identical to control values computed by slicing appropriate windows manually and running PCA on them.
这篇关于在 pandas 数据帧上滚动 PCA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!