pandas GroupBy并计算Z分数 [英] Pandas GroupBy and Calculate Z-Score
问题描述
所以我有一个看起来像这样的数据框:
So I have a dataframe that looks like this:
pd.DataFrame([[1, 10, 14], [1, 12, 14], [1, 20, 12], [1, 25, 12], [2, 18, 12], [2, 30, 14], [2, 4, 12], [2, 10, 14]], columns = ['A', 'B', 'C'])
A B C
0 1 10 14
1 1 12 14
2 1 20 12
3 1 25 12
4 2 18 12
5 2 30 14
6 2 4 12
7 2 10 14
我的目标是获取B列相对于A列和C列的z分数.我知道我可以计算每组的均值和标准差
My goal is to get the z-scores of column B, relative to their groups by column A and C. I know I can calculate the mean and standard deviation of each group
test.groupby(['A', 'C']).mean()
B
A C
1 12 22.5
14 11.0
2 12 11.0
14 20.0
test.groupby(['A', 'C']).std()
B
A C
1 12 3.535534
14 1.414214
2 12 9.899495
14 14.142136
现在,我要根据这些均值和标准差来计算B列中的每个项目的z得分.因此,第一个结果将是(10-11)/1.41.我觉得必须有一种无需太多复杂性就能做到这一点的方法,但是我一直坚持如何进行.让我知道是否有人可以向我指出正确的方向,或者我是否需要澄清任何事情!
Now for every item in column B I want to calculate it's z-score based off of these means and standard deviations. So the first result would be (10 - 11) / 1.41. I feel like there has to be a way to do this without too much complexity but I've been stuck on how to proceed. Let me know if anyone can point me in the right direction or if I need to clarify anything!
推荐答案
使用transform
Mean=test.groupby(['A', 'C']).B.transform('mean')
Std=test.groupby(['A', 'C']).B.transform('std')
然后
(test.B - Mean) / Std
scipy
from scipy.stats import zscore
test.groupby(['A', 'C']).B.transform(lambda x : zscore(x,ddof=1))
Out[140]:
0 -0.707107
1 0.707107
2 -0.707107
3 0.707107
4 0.707107
5 0.707107
6 -0.707107
7 -0.707107
Name: B, dtype: float64
这篇关于 pandas GroupBy并计算Z分数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!