Pandas - 计算所有列的 z-score [英] Pandas - Compute z-score for all columns
问题描述
我有一个包含单列 ID 的数据框,所有其他列都是我想要计算 z 分数的数值.这是它的一个小节:
I have a dataframe containing a single column of IDs and all other columns are numerical values for which I want to compute z-scores. Here's a subsection of it:
ID Age BMI Risk Factor
PT 6 48 19.3 4
PT 8 43 20.9 NaN
PT 2 39 18.1 3
PT 9 41 19.5 NaN
我的某些列包含 NaN 值,我不想将这些值包含在 z 分数计算中,因此我打算使用针对此问题提供的解决方案:如何用 nans 对熊猫列进行 zscore 标准化?
Some of my columns contain NaN values which I do not want to include into the z-score calculations so I intend to use a solution offered to this question: how to zscore normalize pandas column with nans?
df['zscore'] = (df.a - df.a.mean())/df.a.std(ddof=0)
我有兴趣将此解决方案应用于除 ID 列之外的所有列以生成新的数据框,我可以使用
I'm interested in applying this solution to all of my columns except the ID column to produce a new dataframe which I can save as an Excel file using
df2.to_excel("Z-Scores.xlsx")
所以基本上;如何计算每列的 z 分数(忽略 NaN 值)并将所有内容推送到新数据框中?
So basically; how can I compute z-scores for each column (ignoring NaN values) and push everything into a new dataframe?
旁注:pandas 中有一个叫做索引"的概念,它吓到我了,因为我不太了解它.如果索引是解决此问题的关键部分,请简化您对索引的解释.
SIDENOTE: there is a concept in pandas called "indexing" which intimidates me because I do not understand it well. If indexing is a crucial part of solving this problem, please dumb down your explanation of indexing.
推荐答案
从列中构建一个列表并删除您不想为其计算 Z 分数的列:
Build a list from the columns and remove the column you don't want to calculate the Z score for:
In [66]:
cols = list(df.columns)
cols.remove('ID')
df[cols]
Out[66]:
Age BMI Risk Factor
0 6 48 19.3 4
1 8 43 20.9 NaN
2 2 39 18.1 3
3 9 41 19.5 NaN
In [68]:
# now iterate over the remaining columns and create a new zscore column
for col in cols:
col_zscore = col + '_zscore'
df[col_zscore] = (df[col] - df[col].mean())/df[col].std(ddof=0)
df
Out[68]:
ID Age BMI Risk Factor Age_zscore BMI_zscore Risk_zscore
0 PT 6 48 19.3 4 -0.093250 1.569614 -0.150946
1 PT 8 43 20.9 NaN 0.652753 0.074744 1.459148
2 PT 2 39 18.1 3 -1.585258 -1.121153 -1.358517
3 PT 9 41 19.5 NaN 1.025755 -0.523205 0.050315
Factor_zscore
0 1
1 NaN
2 -1
3 NaN
这篇关于Pandas - 计算所有列的 z-score的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!