Pandas - 跨多列扩展 Z-Score [英] Pandas - Expanding Z-Score Across Multiple Columns

查看：61 发布时间：2021/6/14 18:36:15 python pandas pandas-groupby

本文介绍了Pandas - 跨多列扩展 Z-Score的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想为 DataFrame 中的一些时间序列数据计算扩展 z 分数，但我想使用多列的均值和标准差来标准化数据，而不是每个列中的均值和标准差列分开.我相信我想使用 groupby 和 DataFrame.expanding 的某种组合，但我似乎无法弄清楚.以下是一些示例数据:

I want to calculate an expanding z-score for some time series data that I have in a DataFrame, but I want to standardize the data using the mean and standard deviation of multiple columns, rather than the mean and standard deviation within each column separately. I believe that I want to use some combination of groupby and DataFrame.expanding but I can't seem to figure it out. Here's some example data:

import pandas as pd
import numpy as np
np.random.seed(42)

df = pd.DataFrame(np.random.rand(5,5),
                  columns=list('ABCDE'),
                  index=pd.date_range('2016-12-31', periods=5))

df.index.name = 'DATE'

df

输入:

所需的输出:

我将行和数据系列的日期作为单独的列.我想要的是一个形状相同的新 DataFrame，我已经在其中计算了扩展的 Z-Score.我不知道该怎么做是让 df.expanding(2).mean() 方法跨多列聚合.也就是说，我不想取 A 列的扩展平均值并从 A 列中的值中减去它，而是取 A 到 E 列中值的扩展平均值，然后从 A 中的值中减去该平均值.

I have dates down the rows and the data series as individual columns. What I want is a new DataFrame of the same shape where I've calculated the expanding Z-Score. What I can't figure out how to do is to get the df.expanding(2).mean() method to aggregate across multiple columns. That is to say, rather than taking the expanding mean of column A and subtracting that from the value in column A, I want to take the expanding mean of the values in columns A through E and subtract that mean from the value in A.

如果您从 Excel 的角度考虑，我所说的是 =AVERAGE(B$2:B3) 和 =AVERAGE($B$2:$F3) 之间的区别.做前者非常简单(df.expanding(2).mean())，但我无法弄清楚如何在我的生活中做后者.


If you think in terms of Excel, what I'm talking about is the difference between =AVERAGE(B$2:B3) and =AVERAGE($B$2:$F3). To do the former is incredibly simple (df.expanding(2).mean()) but I can't figure out how to do the latter for the life of me.
我对 groupby、stack() 和 expanding() 的各种组合进行了大量实验，但均无济于事.
I've experimented a lot with various combinations of groupby, stack(), and expanding() to no avail.
推荐答案
这是我自己尝试计算合并所有列的扩展 Z 分数的尝试.欢迎对如何更有效地进行评论.
This is my own attempt at trying to calculate the expanding Z-Scores pooling all of the columns. Comments on how to do it more efficiently would be welcome.
def pooled_expanding_zscore(df, min_periods=2):
"""Calculates an expanding Z-Score down the rows of the DataFrame while pooling all of the columns.

Assumes that indexes are not hierarchical.
Assumes that df does not have columns named 'exp_mean' and 'exp_std'.
"""

# Get last sorted column name
colNames = df.columns.values
colNames.sort()
lastCol = colNames[-1]

# Index name
indexName = df.index.name

# Normalize DataFrame
df_stacked = pd.melt(df.reset_index(),id_vars=indexName).sort_values(by=[indexName,'variable'])

# Calculates the expanding mean and standard deviation on df_stacked
# Keeps just the rows where 'variable'==lastCol
df_exp = df_stacked.expanding(2)['value']
df_stacked.loc[:,'exp_mean'] = df_exp.mean()
df_stacked.loc[:,'exp_std'] = df_exp.std()

exp_stats = (df_stacked.loc[df_stacked.variable==lastCol,:]
            .reset_index()
            .drop(['index','variable','value'], axis=1)
            .set_index(indexName))

# add exp_mean and exp_std back to df
df = pd.concat([df,exp_stats],axis=1)

# Calculate Z-Score
df_mat = df.loc[:,colNames].as_matrix()
exp_mean_mat = df.loc[:,'exp_mean'].as_matrix()[:,np.newaxis]
exp_std_mat = df.loc[:,'exp_std'].as_matrix()[:,np.newaxis]
zScores = pd.DataFrame(
    (df_mat - exp_mean_mat) / exp_std_mat,
    index=df.index,
    columns=colNames)

# Use min_periods to kill off early rows
zScores.iloc[:min_periods-1,:] = np.nan

return zScores


                        这篇关于Pandas - 跨多列扩展 Z-Score的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

Pandas - 跨多列扩展 Z-Score [英] Pandas - Expanding Z-Score Across Multiple Columns

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Pandas - 跨多列扩展 Z-Score [英] Pandas - Expanding Z-Score Across Multiple Columns

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭