分组后合并 [英] Merge after groupby

查看:110
本文介绍了分组后合并的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在groupby之后,我在使用pd.merge时遇到了麻烦.这是我的假设:

I'm having trouble using pd.merge after groupby. Here's my hypothetical:

import pandas as pd
from pandas import DataFrame
import numpy as np

df1 = DataFrame({'key': [1,1,2,2,3,3],
                 'var11': np.random.randn(6),
                 'var12': np.random.randn(6)})
df2 = DataFrame({'key': [1,2,3],
                 'var21': np.random.randn(3),
                 'var22': np.random.randn(3)})

#group var11 in df1 by key
grouped = df1['var11'].groupby(df1['key'])

# calculate the mean of var11 by key
grouped = grouped.mean()
print grouped
key
1      1.399430
2      0.568216
3     -0.612843
dtype: float64

print grouped.index
Int64Index([1, 2, 3], dtype='int64')

print df2
   key     var21     var22
0    1 -0.381078  0.224325
1    2  0.836719 -0.565498
2    3  0.323412 -1.616901

df2 = pd.merge(df2, grouped, left_on = 'key', right_index = True)

这时,我得到IndexError:列表索引超出范围.

At this point, I get IndexError: list index out of range.

使用groupby时,分组变量(在此示例中为键")成为结果序列的索引,这就是为什么我指定"right_index = True"的原因.我尝试了其他语法但没有成功.有什么建议吗?

When using groupby, the grouping variable ('key' in this example) becomes the index for the resultant series, which is why I specify 'right_index = True'. I've tried other syntax without success. Any advice?

推荐答案

我认为您应该这样做:

In [140]:

df2 = pd.merge(df2,
               pd.DataFrame(grouped, columns=['mean']),
               left_on='key', 
               right_index=True)
print df2
   key     var21     var22      mean
0    1  0.324476  0.701254  0.400313
1    2 -1.270500  0.055383 -0.293691
2    3  0.804864  0.566747  0.628787

[3 rows x 4 columns]

之所以不起作用,是因为groupedSeries而不是DataFrame

The reason it didn't work is that grouped is a Series not a DataFrame

这篇关于分组后合并的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆