在单一级别的MultiIndex上合并 [英] Merge on single level of MultiIndex

查看:84
本文介绍了在单一级别的MultiIndex上合并的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以在不重置索引的情况下在MultiIndex的单个级别上进行合并?

Is there any way to merge on a single level of a MultiIndex without resetting the index?

我有一个由ObjectID索引的时不变值的静态"表,而我有一个由ObjectID + Date索引的时变字段的动态"表.我想将这些表连接在一起.

I have a "static" table of time-invariant values, indexed by an ObjectID, and I have a "dynamic" table of time-varying fields, indexed by ObjectID+Date. I'd like to join these tables together.

现在,我能想到的最好的是:

Right now, the best I can think of is:

dynamic.reset_index().merge(static, left_on=['ObjectID'], right_index=True)

但是,动态表非常大,我不想为了合并值而随意修改其索引.

However, the dynamic table is very big, and I don't want to have to muck around with its index in order to combine the values.

推荐答案

是的,自熊猫0.14.0起,现在可以使用.join将单索引DataFrame与多索引DataFrame级别合并.

Yes, since pandas 0.14.0, it is now possible to merge a singly-indexed DataFrame with a level of a multi-indexed DataFrame using .join.

df1.join(df2, how='inner') # how='outer' keeps all records from both data frames

0.14个熊猫文档将其描述为等效项,但比以下方法具有更高的内存效率和更快的速度:

The 0.14 pandas docs describes this as equivalent but more memory efficient and faster than:

merge(df1.reset_index(),
      df2.reset_index(),
      on=['index1'],
      how='inner'
     ).set_index(['index1','index2'])

文档还提到.join不能用于在单个级别上合并两个多索引的DataFrame,从上一期的GitHub跟踪器讨论中可以看出,实现此优先级似乎不高:

The docs also mention that .join can not be used to merge two multiindexed DataFrames on a single level and from the GitHub tracker discussion for the previous issue, it seems like this might not of priority to implement:

所以我合并为单个联接,请参见#6363;以及有关的一些文档 如何进行多-多联接.这实际上是相当复杂的 实行.和恕我直言,不值得付出的努力,因为它确实不会改变 内存使用率/速度就这么多.

so I merged in the single join, see #6363; along with some docs on how to do a multi-multi join. THat's fairly complicated to actually implement. and IMHO not worth the effort as it really doesn't change the memory usage/speed that much at all.

但是,与此相关的是GitHub对话,最近有一些进展 https://github.com/pydata/pandas/issues/6360 .也可以通过重置索引来实现此目的,如先前所述,文档中也对此进行了描述.

However, there is a GitHub conversation regarding this, where there has been some recent development https://github.com/pydata/pandas/issues/6360. It is also possible achieve this by resetting the indices as mentioned earlier and described in the docs as well.

现在可以将多索引数据帧相互合并.根据发行说明:

It is now possible to merge multiindexed data frames with each other. As per the release notes:

index_left = pd.MultiIndex.from_tuples([('K0', 'X0'), ('K0', 'X1'),
                                        ('K1', 'X2')],
                                        names=['key', 'X'])

left = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
                     'B': ['B0', 'B1', 'B2']}, index=index_left)

index_right = pd.MultiIndex.from_tuples([('K0', 'Y0'), ('K1', 'Y1'),
                                        ('K2', 'Y2'), ('K2', 'Y3')],
                                        names=['key', 'Y'])

right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'],
                      'D': ['D0', 'D1', 'D2', 'D3']}, index=index_right)

left.join(right)

出局:

            A   B   C   D
key X  Y                 
K0  X0 Y0  A0  B0  C0  D0
    X1 Y0  A1  B1  C0  D0
K1  X2 Y1  A2  B2  C1  D1

[3 rows x 4 columns]

这篇关于在单一级别的MultiIndex上合并的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆