pandas - 计算所有列的z分数 [英] Pandas - Compute z-score for all columns

查看:219
本文介绍了 pandas - 计算所有列的z分数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含一列ID的数据框,所有其他列都是我想要计算z分数的数值。以下是它的一个小节:

I have a dataframe containing a single column of IDs and all other columns are numerical values for which I want to compute z-scores. Here's a subsection of it:

ID      Age    BMI    Risk Factor
PT 6    48     19.3    4
PT 8    43     20.9    NaN
PT 2    39     18.1    3
PT 9    41     19.5    NaN

我的一些列包含NaN值,我不想将其包括在z-score计算中,所以我打算使用提供给这个问题的解决方案:如何使用nans来规范化pandas列?

Some of my columns contain NaN values which I do not want to include into the z-score calculations so I intend to use a solution offered to this question: how to zscore normalize pandas column with nans?

df['zscore'] = (df.a - df.a.mean())/df.a.std(ddof=0)

我有兴趣将此解决方案应用于我的所有列,除了ID列以生成新数据帧,我可以使用

I'm interested in applying this solution to all of my columns except the ID column to produce a new dataframe which I can save as an Excel file using

df2.to_excel("Z-Scores.xlsx")

所以基本上;如何计算每列的z分数(忽略NaN值)并将所有内容推送到新的数据框?

So basically; how can I compute z-scores for each column (ignoring NaN values) and push everything into a new dataframe?

SIDENOTE:pandas中有一个名为索引的概念这让我很害怕,因为我不太了解它。如果索引是解决此问题的关键部分,请愚蠢地解释索引。

SIDENOTE: there is a concept in pandas called "indexing" which intimidates me because I do not understand it well. If indexing is a crucial part of solving this problem, please dumb down your explanation of indexing.

推荐答案

从列构建列表并删除您不想计算Z分数的列:

Build a list from the columns and remove the column you don't want to calculate the Z score for:

In [66]:
cols = list(df.columns)
cols.remove('ID')
df[cols]

Out[66]:
   Age  BMI  Risk  Factor
0    6   48  19.3       4
1    8   43  20.9     NaN
2    2   39  18.1       3
3    9   41  19.5     NaN
In [68]:
# now iterate over the remaining columns and create a new zscore column
for col in cols:
    col_zscore = col + '_zscore'
    df[col_zscore] = (df[col] - df[col].mean())/df[col].std(ddof=0)
df
Out[68]:
   ID  Age  BMI  Risk  Factor  Age_zscore  BMI_zscore  Risk_zscore  \
0  PT    6   48  19.3       4   -0.093250    1.569614    -0.150946   
1  PT    8   43  20.9     NaN    0.652753    0.074744     1.459148   
2  PT    2   39  18.1       3   -1.585258   -1.121153    -1.358517   
3  PT    9   41  19.5     NaN    1.025755   -0.523205     0.050315   

   Factor_zscore  
0              1  
1            NaN  
2             -1  
3            NaN  

这篇关于 pandas - 计算所有列的z分数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆