合并数据帧中的 pandas “多索引"问题 [英] Pandas 'multi-index' issue in merging dataframes
问题描述
我有一个面板数据集,如 df
I have a panel dataset as df
stock year date return
VOD 2017 01-01 0.05
VOD 2017 01-02 0.03
VOD 2017 01-03 0.04
... ... ... ....
BAT 2017 01-01 0.05
BAT 2017 01-02 0.07
BAT 2017 01-03 0.10
所以我使用此代码来获取每年每只股票的收益均值和偏度.
so I use this code to get the mean and skewness of the return for each stock in each year.
df2=df.groupby(['stock','year']).mean().reset_index()
df3=df.groupby(['stock','year']).skew().reset_index()
df2
和 df3
看起来不错.
df2
就像(我更改列名之后)
df2
is like (after I change the column name)
stock year mean_return
VOD 2017 0.09
BAT 2017 0.14
... ... ...
df3
就像(我更改列名之后)
df3
is like (after I change the column name)
stock year return_skewness
VOD 2017 -0.34
BAT 2017 -0.04
... ... ...
问题是当我尝试通过使用
The problem is when I tried to merge df2
and df3
by using
want=pd.merge(df2,df2, on=['stock','year'],how='outer')
python给了我
'The column label 'stock' is not unique.
For a multi-index, the label must be a tuple with elements corresponding to each level.'
,这让我很困惑.
我可以使用 want = pd.merge(df2,df3,left_index = True,right_index = True,how ='outer')
合并 df2
和 df3
,但是之后我必须重命名列,因为列名在括号中.
I can use want = pd.merge(df2,df3, left_index=True, right_index=True, how='outer')
to merge df2
and df3
, but after that i have to rename the columns as column names are in parentheses.
是否有任何方便的方法来合并 df2
和 df3
?谢谢
Is there any convenient way to merge df2
and df3
? Thanks
推荐答案
Better is use agg
for specify aggregate function in list and column for aggregation after function:
df3 = (df.groupby(['stock','year'])['return']
.agg([('mean_return','mean'),('return_skewness','skew')])
.reset_index())
print (df3)
stock year mean_return return_skewness
0 BAT 2017 0.073333 0.585583
1 VOD 2017 0.040000 0.000000
应通过删除 reset_index
, rename
和最后一个
Your solution should be changed with remove reset_index
, rename
and last concat
, also is specified column return
for aggregate:
s2=df.groupby(['stock','year'])['return'].mean().rename('mean_return')
s3=df.groupby(['stock','year'])['return'].skew().rename('return_skewness')
df3 = pd.concat([s2, s3], axis=1).reset_index()
print (df3)
stock year mean_return return_skewness
0 BAT 2017 0.073333 0.585583
1 VOD 2017 0.040000 0.000000
如果需要汇总所有数字列,请先在 groupby
之后删除列表,然后将 map
与 join
结合使用以展平 MultiIndex
:
If need aggregate all numeric columns remove list after groupby
first and then use map
with join
for flatten MultiIndex
:
print (df)
stock year date return col
0 VOD 2017 01-01 0.05 1
1 VOD 2017 01-02 0.03 8
2 VOD 2017 01-03 0.04 9
3 BAT 2017 01-01 0.05 1
4 BAT 2017 01-02 0.07 4
5 BAT 2017 01-03 0.10 3
df3 = df.groupby(['stock','year']).agg(['mean','skew'])
print (df3)
return col
mean skew mean skew
stock year
BAT 2017 0.073333 0.585583 2.666667 -0.935220
VOD 2017 0.040000 0.000000 6.000000 -1.630059
df3.columns = df3.columns.map('_'.join)
df3 = df3.reset_index()
print (df3)
stock year return_mean return_skew col_mean col_skew
0 BAT 2017 0.073333 0.585583 2.666667 -0.935220
1 VOD 2017 0.040000 0.000000 6.000000 -1.630059
您的解决方案应更改:
df2=df.groupby(['stock','year']).mean().add_prefix('mean_')
df3=df.groupby(['stock','year']).skew().add_prefix('skew_')
df3 = pd.concat([df2, df3], axis=1).reset_index()
print (df3)
stock year mean_return mean_col skew_return skew_col
0 BAT 2017 0.073333 2.666667 0.585583 -0.935220
1 VOD 2017 0.040000 6.000000 0.000000 -1.630059
这篇关于合并数据帧中的 pandas “多索引"问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!