当只有一级索引用作键时,pandas 与 MultiIndex 合并 [英] pandas merge with MultiIndex, when only one level of index is to be used as key

查看:74
本文介绍了当只有一级索引用作键时,pandas 与 MultiIndex 合并的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个名为 df1 的数据框,带有 2 级 MultiIndex(级别:'_Date' 和 _'ItemId').'_ItemId' 的每个值都有多个实例,如下所示:

 _SomeOtherLabel_Date _ItemId2014-10-05 6588921 AA6592520 AB6836143 BA2014-10-11 6588921 加州6592520 CB6836143 达

我有一个名为 df2 的第二个数据框,其中_ItemId"用作键(不是索引).在这个df中,_ItemId的每个值只出现一次:

 _ItemId _Cat0 6588921 6_11 6592520 6_12 6836143 7_1

我想从 df2 恢复_Cat"列中的值,并将它们合并到 df1 中以获取_ItemId"的适当值.这几乎是(我认为?)标准的多对一合并,除了左侧 df 的适当键是 MultiIndex 级别之一.我试过这个:

df1['_cat']=pd.merge(df1,df2,left_index=True, right_on='ItemId')

但我收到错误

 "ValueError: len(right_on) 必须等于 "left" 索引中的级别数

我认为这是有道理的,因为我的(左)索引实际上是由两个键组成的.如何选择我需要的一个索引级别?或者有没有更好的方法来合并?

谢谢

解决方案

我可以想到 2 种方法.

使用 set_index()join():

<预><代码>>>>df1.join(df2.set_index('_ItemId'))_SomeOtherLabel _Cat_Date _ItemId2014-10-05 6588921 AA 6_16592520 AB 6_16836143 BA 7_12014-10-11 6588921 CA 6_16592520 CB 6_16836143 DA 7_1

或使用 reset_index(), merge() 然后设置新的多索引

我认为第一种方法应该更快,但不确定.

I have a data frame called df1 with a 2-level MultiIndex (levels: '_Date' and _'ItemId'). There are multiple instances of each value of '_ItemId', like this:

                              _SomeOtherLabel
 _Date            _ItemId     
 2014-10-05       6588921     AA
                  6592520     AB 
                  6836143     BA
 2014-10-11       6588921     CA
                  6592520     CB
                  6836143     DA 

I have a second data frame called df2 with '_ItemId' used as a key (not the index). In this df, there is only one occurrence of each value of _ItemId:

                  _ItemId       _Cat
  0               6588921       6_1
  1               6592520       6_1
  2               6836143       7_1

I want to recover the values in the column '_Cat' from df2 and merge them into df1 for the appropriate values of '_ItemId'. This is almost (I think?) a standard many-to-one merge, except that the appropriate key for the left df is one of MultiIndex levels. I tried this:

df1['_cat']=pd.merge(df1,df2,left_index=True, right_on='ItemId')  

but I get the error

   "ValueError: len(right_on) must equal the number of levels in the index of "left"

which I suppose makes sense since my (left) index is actually made of two keys. How do I select the one index level that I need? Or is there a better approach to this merge?

Thanks

解决方案

I could think of 2 ways of doing this.

use set_index() and join():

>>> df1.join(df2.set_index('_ItemId'))
                   _SomeOtherLabel _Cat
_Date      _ItemId                     
2014-10-05 6588921              AA  6_1
           6592520              AB  6_1
           6836143              BA  7_1
2014-10-11 6588921              CA  6_1
           6592520              CB  6_1
           6836143              DA  7_1

or use reset_index(), merge() and then set new multiindex

I think first approach should be faster, but not sure.

这篇关于当只有一级索引用作键时,pandas 与 MultiIndex 合并的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆