pandas - 计算所有列的z分数 [英] Pandas - Compute z-score for all columns
问题描述
我有一个包含一列ID的数据框,所有其他列都是我想要计算z分数的数值。以下是它的一个小节:
I have a dataframe containing a single column of IDs and all other columns are numerical values for which I want to compute z-scores. Here's a subsection of it:
ID Age BMI Risk Factor
PT 6 48 19.3 4
PT 8 43 20.9 NaN
PT 2 39 18.1 3
PT 9 41 19.5 NaN
我的一些列包含NaN值,我不想将其包括在z-score计算中,所以我打算使用提供给这个问题的解决方案:如何使用nans来规范化pandas列?
Some of my columns contain NaN values which I do not want to include into the z-score calculations so I intend to use a solution offered to this question: how to zscore normalize pandas column with nans?
df['zscore'] = (df.a - df.a.mean())/df.a.std(ddof=0)
我有兴趣将此解决方案应用于我的所有列,除了ID列以生成新数据帧,我可以使用
I'm interested in applying this solution to all of my columns except the ID column to produce a new dataframe which I can save as an Excel file using
df2.to_excel("Z-Scores.xlsx")
所以基本上;如何计算每列的z分数(忽略NaN值)并将所有内容推送到新的数据框?
So basically; how can I compute z-scores for each column (ignoring NaN values) and push everything into a new dataframe?
SIDENOTE:pandas中有一个名为索引的概念这让我很害怕,因为我不太了解它。如果索引是解决此问题的关键部分,请愚蠢地解释索引。
SIDENOTE: there is a concept in pandas called "indexing" which intimidates me because I do not understand it well. If indexing is a crucial part of solving this problem, please dumb down your explanation of indexing.
推荐答案
从列构建列表并删除您不想计算Z分数的列:
Build a list from the columns and remove the column you don't want to calculate the Z score for:
In [66]:
cols = list(df.columns)
cols.remove('ID')
df[cols]
Out[66]:
Age BMI Risk Factor
0 6 48 19.3 4
1 8 43 20.9 NaN
2 2 39 18.1 3
3 9 41 19.5 NaN
In [68]:
# now iterate over the remaining columns and create a new zscore column
for col in cols:
col_zscore = col + '_zscore'
df[col_zscore] = (df[col] - df[col].mean())/df[col].std(ddof=0)
df
Out[68]:
ID Age BMI Risk Factor Age_zscore BMI_zscore Risk_zscore \
0 PT 6 48 19.3 4 -0.093250 1.569614 -0.150946
1 PT 8 43 20.9 NaN 0.652753 0.074744 1.459148
2 PT 2 39 18.1 3 -1.585258 -1.121153 -1.358517
3 PT 9 41 19.5 NaN 1.025755 -0.523205 0.050315
Factor_zscore
0 1
1 NaN
2 -1
3 NaN
这篇关于 pandas - 计算所有列的z分数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!