如何将多索引数据框与单个索引数据框连接在一起? [英] How to do join of multiindex dataframe with a single index dataframe?

查看:78
本文介绍了如何将多索引数据框与单个索引数据框连接在一起?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

df1的单个索引与df2的multiindex的子级别匹配.两者具有相同的列.我要将df1的所有行和列都复制到df2.

The single index of df1 matches with a sublevel of multiindex of df2. Both have the same columns. I want to copy all rows and columns of df1 to df2.

它类似于此线程: 将单索引DataFrame复制到MultiIndex DataFrame

但是该解决方案仅适用于一个索引值,在这种情况下为索引"a".我想对df1的所有索引执行此操作.

But that solution only work for one index value, the index 'a' in that case. I want to do this operation for all index of df1.

In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: import itertools
In [4]: inner = ('a','b')
In [5]: outer = ((10,20), (1,2))
In [6]: cols = ('one','two','three','four')
In [7]: sngl = pd.DataFrame(np.random.randn(2,4), index=inner, columns=cols)
In [8]: index_tups = list(itertools.product(*(outer + (inner,))))
In [9]: index_mult = pd.MultiIndex.from_tuples(index_tups)
In [10]: mult = pd.DataFrame(index=index_mult, columns=cols)
In [11]: sngl
Out[11]: 
        one       two     three      four
a  2.946876 -0.751171  2.306766  0.323146
b  0.192558  0.928031  1.230475 -0.256739

In [12]: mult
Out[12]: 
        one  two three four
10 1 a  NaN  NaN   NaN  NaN
     b  NaN  NaN   NaN  NaN
   2 a  NaN  NaN   NaN  NaN
     b  NaN  NaN   NaN  NaN
20 1 a  NaN  NaN   NaN  NaN
     b  NaN  NaN   NaN  NaN
   2 a  NaN  NaN   NaN  NaN
     b  NaN  NaN   NaN  NaN


In [13]: mult.ix[(10,1)] = sngl

In [14]: mult
Out[14]: 
        one  two three four
10 1 a  NaN  NaN   NaN  NaN
     b  NaN  NaN   NaN  NaN
   2 a  NaN  NaN   NaN  NaN
     b  NaN  NaN   NaN  NaN
20 1 a  NaN  NaN   NaN  NaN
     b  NaN  NaN   NaN  NaN
   2 a  NaN  NaN   NaN  NaN
     b  NaN  NaN   NaN  NaN

@Jeff提供的解决方案是

The solution given by @Jeff is

nm = mult.reset_index().set_index('level_2')
nm.loc['a',sngl.columns] = sngl.loc['a'].values

         level_0  level_1        one        two     three        four
level_2                                                              
a             10        1  0.3738456 -0.2261926 -1.205177  0.08448757
b             10        1        NaN        NaN       NaN         NaN
a             10        2  0.3738456 -0.2261926 -1.205177  0.08448757
b             10        2        NaN        NaN       NaN         NaN
a             20        1  0.3738456 -0.2261926 -1.205177  0.08448757
b             20        1        NaN        NaN       NaN         NaN
a             20        2  0.3738456 -0.2261926 -1.205177  0.08448757
b             20        2        NaN        NaN       NaN         NaN

我不能这样做:

nm.loc[:,sngl.columns] = sngl.loc[:].values

这将引发ValueError:无法将大小为X的序列复制到维度为Y的数组轴上"

It will raise ValueError: "cannot copy sequence with size X to array axis with dimension Y"

我当前正在使用循环.但这不是大熊猫的方式.

I am currently using a loop. But this is not the pandas way.

推荐答案

这感觉有点手工,但是在实践中我可能会这样做:

This feels a little too manual, but in practice I might do something like this:

In [46]: mult[:] = sngl.loc[mult.index.get_level_values(2)].values

In [47]: mult
Out[47]: 
             one       two     three      four
10 1 a  1.175042  0.044014  1.341404 -0.223872
     b  0.216168 -0.748194 -0.546003 -0.501149
   2 a  1.175042  0.044014  1.341404 -0.223872
     b  0.216168 -0.748194 -0.546003 -0.501149
20 1 a  1.175042  0.044014  1.341404 -0.223872
     b  0.216168 -0.748194 -0.546003 -0.501149
   2 a  1.175042  0.044014  1.341404 -0.223872
     b  0.216168 -0.748194 -0.546003 -0.501149

也就是说,首先选择我们要用于索引的元素:

That is, first select the elements we want to use to index:

In [64]: mult.index.get_level_values(2)
Out[64]: Index(['a', 'b', 'a', 'b', 'a', 'b', 'a', 'b'], dtype='object')

然后使用这些索引到sngl:

In [65]: sngl.loc[mult.index.get_level_values(2)]
Out[65]: 
        one       two     three      four
a  1.175042  0.044014  1.341404 -0.223872
b  0.216168 -0.748194 -0.546003 -0.501149
a  1.175042  0.044014  1.341404 -0.223872
b  0.216168 -0.748194 -0.546003 -0.501149
a  1.175042  0.044014  1.341404 -0.223872
b  0.216168 -0.748194 -0.546003 -0.501149
a  1.175042  0.044014  1.341404 -0.223872
b  0.216168 -0.748194 -0.546003 -0.501149

然后我们可以使用.values丢弃索引信息,而只获取要填充的原始数组.

and then we can use .values to throw away the indexing information and just get the raw array to fill with.

它不是很优雅,但是很简单.

It's not very elegant, but it's straightforward.

这篇关于如何将多索引数据框与单个索引数据框连接在一起?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆