Pandas - 计算所有列的 z-score [英] Pandas - Compute z-score for all columns

查看:27
本文介绍了Pandas - 计算所有列的 z-score的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含单列 ID 的数据框,所有其他列都是我想要计算 z 分数的数值.这是它的一个小节:

I have a dataframe containing a single column of IDs and all other columns are numerical values for which I want to compute z-scores. Here's a subsection of it:

ID      Age    BMI    Risk Factor
PT 6    48     19.3    4
PT 8    43     20.9    NaN
PT 2    39     18.1    3
PT 9    41     19.5    NaN

我的某些列包含 NaN 值,我不想将这些值包含在 z 分数计算中,因此我打算使用针对此问题提供的解决方案:如何用 nans 对熊猫列进行 zscore 标准化?

Some of my columns contain NaN values which I do not want to include into the z-score calculations so I intend to use a solution offered to this question: how to zscore normalize pandas column with nans?

df['zscore'] = (df.a - df.a.mean())/df.a.std(ddof=0)

我有兴趣将此解决方案应用于除 ID 列之外的所有列以生成新的数据框,我可以使用

I'm interested in applying this solution to all of my columns except the ID column to produce a new dataframe which I can save as an Excel file using

df2.to_excel("Z-Scores.xlsx")

所以基本上;如何计算每列的 z 分数(忽略 NaN 值)并将所有内容推送到新数据框中?

So basically; how can I compute z-scores for each column (ignoring NaN values) and push everything into a new dataframe?

旁注:pandas 中有一个叫做索引"的概念,它吓到我了,因为我不太了解它.如果索引是解决此问题的关键部分,请简化您对索引的解释.

SIDENOTE: there is a concept in pandas called "indexing" which intimidates me because I do not understand it well. If indexing is a crucial part of solving this problem, please dumb down your explanation of indexing.

推荐答案

从列中构建一个列表并删除您不想为其计算 Z 分数的列:

Build a list from the columns and remove the column you don't want to calculate the Z score for:

In [66]:
cols = list(df.columns)
cols.remove('ID')
df[cols]

Out[66]:
   Age  BMI  Risk  Factor
0    6   48  19.3       4
1    8   43  20.9     NaN
2    2   39  18.1       3
3    9   41  19.5     NaN
In [68]:
# now iterate over the remaining columns and create a new zscore column
for col in cols:
    col_zscore = col + '_zscore'
    df[col_zscore] = (df[col] - df[col].mean())/df[col].std(ddof=0)
df
Out[68]:
   ID  Age  BMI  Risk  Factor  Age_zscore  BMI_zscore  Risk_zscore  
0  PT    6   48  19.3       4   -0.093250    1.569614    -0.150946   
1  PT    8   43  20.9     NaN    0.652753    0.074744     1.459148   
2  PT    2   39  18.1       3   -1.585258   -1.121153    -1.358517   
3  PT    9   41  19.5     NaN    1.025755   -0.523205     0.050315   

   Factor_zscore  
0              1  
1            NaN  
2             -1  
3            NaN  

这篇关于Pandas - 计算所有列的 z-score的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆