按组规范化DataFrame [英] Normalize DataFrame by group
本文介绍了按组规范化DataFrame的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
比方说,我生成了一些数据,如下所示:
Let's say that I have some data generated as follows:
N = 20
m = 3
data = np.random.normal(size=(N,m)) + np.random.normal(size=(N,m))**3
然后创建一些分类变量:
and then I create some categorization variable:
indx = np.random.randint(0,3,size=N).astype(np.int32)
并生成一个DataFrame:
and generate a DataFrame:
import pandas as pd
df = pd.DataFrame(np.hstack((data, indx[:,None])),
columns=['a%s' % k for k in range(m)] + [ 'indx'])
我可以得到每个组的平均值:
I can get the mean value, per group as:
df.groubpy('indx').mean()
我不确定该怎么做,然后减去原始数据中每个列的每个组的均值,以使每列中的数据通过组中的均值进行归一化.任何建议,将不胜感激.
What I'm unsure of how to do is to then subtract the mean off of each group, per-column in the original data, so that the data in each column is normalized by the mean within group. Any suggestions would be appreciated.
推荐答案
In [10]: df.groupby('indx').transform(lambda x: (x - x.mean()) / x.std())
应该这样做.
这篇关于按组规范化DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文