pandas groupby应用于多个列以生成新列 [英] pandas groupby apply on multiple columns to generate a new column

查看：921 发布时间：2020/5/24 3:38:40 python pandas pandas-groupby pandas-apply

本文介绍了pandas groupby应用于多个列以生成新列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我喜欢使用groupby-apply在pandas数据框中生成一个新列.

I like to generate a new column in pandas dataframe using groupby-apply.

例如，我有一个数据框:

For example, I have a dataframe:

df = pd.DataFrame({'A':[1,2,3,4],'B':['A','B','A','B'],'C':[0,0,1,1]})

并尝试通过groupby-apply生成新列"D".

and try to generate a new column 'D' by groupby-apply.

这有效:

df = df.assign(D=df.groupby('B').C.apply(lambda x: x - x.mean()))

因为(我认为)它返回与数据框具有相同索引的序列:

as (I think) it returns a series with the same index with the dataframe:

In [4]: df.groupby('B').C.apply(lambda x: x - x.mean())
Out[4]:
0   -0.5
1   -0.5
2    0.5
3    0.5
Name: C, dtype: float64

但是，如果我尝试使用多个列来生成新列，则无法将其直接分配给新列.所以这行不通:

But if I try to generate a new column using multiple columns, I cannot assign it directly to a new column. So this doesn't work:

 df.assign(D=df.groupby('B').apply(lambda x: x.A - x.C.mean()))

TypeError: incompatible index of inserted column with frame index

实际上，groupby-apply返回:

and in fact, the groupby-apply returns:

In [8]: df.groupby('B').apply(lambda x: x.A - x.C.mean())
Out[8]:
B
A  0    0.5
   2    2.5
B  1    1.5
   3    3.5
Name: A, dtype: float64

我能做

df.groupby('B').apply(lambda x: x.A - x.C.mean()).reset_index(level=0,drop=True))

但是它看起来很冗长，我不确定这是否总是可以正常工作.

but it seems verbose and I am not sure if this will work as expected always.

所以我的问题是:(i)pandas groupby-apply什么时候返回相似索引的系列与多元索引的系列? (ii)是否有更好的方法通过groupby-apply将新列分配给多个列?

So my question is: (i) when does pandas groupby-apply return a like-indexed series vs a multi-index series? (ii) is there a better way to assign a new column by groupby-apply to multiple columns?

pandas groupby应用于多个列以生成新列 [英] pandas groupby apply on multiple columns to generate a new column

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas groupby应用于多个列以生成新列 [英] pandas groupby apply on multiple columns to generate a new column

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭