DataFrame按组减去均值 [英] DataFrame subtract group-wise means

查看:264
本文介绍了DataFrame按组减去均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有可分为不同组的列的DataFrame.我需要返回df,其中条目是原始值减去组均值.
我通过使用groupby进行了以下操作,它给了我分组均值.

I have a DataFrame with columns that can be divided into different groups. I need to return a df where the entries are the original values minus the group mean.
I did the following by using groupby which gives me the group means.

base = datetime.today().date()
date_list = [base - timedelta(days=x) for x in range(0, 10)]
df = pd.DataFrame(data=np.random.randint(1, 100, (10, 8)), index=date_list, columns=['a1', 'a2', 'b1', 'a3', 'b2', 'c1' , 'c2', 'b3'])

xx = df.loc[[datetime(2016, 5, 18).date()]]
xx.index = ['group']
xx.a1 = 1
xx.a2 = 1
xx.a3 = 1
xx.b3 = 2
xx.b2 = 2
xx.b1 = 2
xx.c1 = 3
xx.c2 = 3
df = df.append(xx)
dft = df.T
dft.groupby(['group']).mean().T

更新20/05/16:

Update 20/05/16:

在unutbu的回答的帮助下,我也提出了以下解决方案:

Aided by unutbu's answer, I come up the following solution as well:

df.T.groupby(group, axis=0).apply(lambda x: x - np.mean(x)).T

推荐答案

如果使用transform方法,例如

means = df.groupby(group, axis=1).transform('mean')

然后,transform将具有与df相同形状的DataFrame.这样可以更轻松地从df中减去means.

then transform will a DataFrame of the same shape as df. This makes it easier to subtract means from df.

您也可以将诸如group=[1,1,1,2,2,3,3]的序列传递给df.groupby,而不是传递列名. df.groupby(group, axis=1)将根据序列值对列进行分组.因此,例如,要根据每个列名称的非数字部分进行分组,可以使用:

You can also pass a sequence, such as group=[1,1,1,2,2,3,3] to df.groupby instead of passing a column name. df.groupby(group, axis=1) will group the columns based on the sequence values. So, for example, to group according to the non-numeric part of each column name, you could use:

import numpy as np
import datetime as DT
np.random.seed(2016)
base = DT.date.today()
date_list = [base - DT.timedelta(days=x) for x in range(0, 10)]
df = pd.DataFrame(data=np.random.randint(1, 100, (10, 8)), 
                  index=date_list, 
                  columns=['a1', 'a2', 'b1', 'a3', 'b2', 'c1' , 'c2', 'b3'])

group = df.columns.str.extract(r'(\D+)', expand=False)
means = df.groupby(group, axis=1).transform('mean')
result = df - means
print(result)

产生

            a1  a2  b1  a3  b2  c1  c2  b3
2016-05-18  29  29  53  29  53  23  23  53
2016-05-17  55  55  32  55  32  92  92  32
2016-05-16  59  59  53  59  53  50  50  53
2016-05-15  46  46  30  46  30  55  55  30
2016-05-14  56  56  28  56  28  28  28  28
2016-05-13  34  34  36  34  36  70  70  36
2016-05-12  39  39  64  39  64  48  48  64
2016-05-11  45  45  59  45  59  57  57  59
2016-05-10  55  55  30  55  30  37  37  30
2016-05-09  61  61  59  61  59  59  59  59

这篇关于DataFrame按组减去均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆