当只有一级索引用作键时,pandas 与 MultiIndex 合并 [英] pandas merge with MultiIndex, when only one level of index is to be used as key
问题描述
我有一个名为 df1 的数据框,带有 2 级 MultiIndex(级别:'_Date' 和 _'ItemId').'_ItemId' 的每个值都有多个实例,如下所示:
_SomeOtherLabel_Date _ItemId2014-10-05 6588921 AA6592520 AB6836143 BA2014-10-11 6588921 加州6592520 CB6836143 达
我有一个名为 df2 的第二个数据框,其中_ItemId"用作键(不是索引).在这个df中,_ItemId的每个值只出现一次:
_ItemId _Cat0 6588921 6_11 6592520 6_12 6836143 7_1
我想从 df2 恢复_Cat"列中的值,并将它们合并到 df1 中以获取_ItemId"的适当值.这几乎是(我认为?)标准的多对一合并,除了左侧 df 的适当键是 MultiIndex 级别之一.我试过这个:
df1['_cat']=pd.merge(df1,df2,left_index=True, right_on='ItemId')
但我收到错误
"ValueError: len(right_on) 必须等于 "left" 索引中的级别数
我认为这是有道理的,因为我的(左)索引实际上是由两个键组成的.如何选择我需要的一个索引级别?或者有没有更好的方法来合并?
谢谢
我可以想到 2 种方法.
使用 set_index()
和 join()
:
或使用 reset_index()
, merge()
然后设置新的多索引
我认为第一种方法应该更快,但不确定.
I have a data frame called df1 with a 2-level MultiIndex (levels: '_Date' and _'ItemId'). There are multiple instances of each value of '_ItemId', like this:
_SomeOtherLabel
_Date _ItemId
2014-10-05 6588921 AA
6592520 AB
6836143 BA
2014-10-11 6588921 CA
6592520 CB
6836143 DA
I have a second data frame called df2 with '_ItemId' used as a key (not the index). In this df, there is only one occurrence of each value of _ItemId:
_ItemId _Cat
0 6588921 6_1
1 6592520 6_1
2 6836143 7_1
I want to recover the values in the column '_Cat' from df2 and merge them into df1 for the appropriate values of '_ItemId'. This is almost (I think?) a standard many-to-one merge, except that the appropriate key for the left df is one of MultiIndex levels. I tried this:
df1['_cat']=pd.merge(df1,df2,left_index=True, right_on='ItemId')
but I get the error
"ValueError: len(right_on) must equal the number of levels in the index of "left"
which I suppose makes sense since my (left) index is actually made of two keys. How do I select the one index level that I need? Or is there a better approach to this merge?
Thanks
I could think of 2 ways of doing this.
use set_index()
and join()
:
>>> df1.join(df2.set_index('_ItemId'))
_SomeOtherLabel _Cat
_Date _ItemId
2014-10-05 6588921 AA 6_1
6592520 AB 6_1
6836143 BA 7_1
2014-10-11 6588921 CA 6_1
6592520 CB 6_1
6836143 DA 7_1
or use reset_index()
, merge()
and then set new multiindex
I think first approach should be faster, but not sure.
这篇关于当只有一级索引用作键时,pandas 与 MultiIndex 合并的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!