如何将多索引数据框与单个索引数据框连接在一起? [英] How to do join of multiindex dataframe with a single index dataframe?
问题描述
df1的单个索引与df2的multiindex的子级别匹配.两者具有相同的列.我要将df1的所有行和列都复制到df2.
The single index of df1 matches with a sublevel of multiindex of df2. Both have the same columns. I want to copy all rows and columns of df1 to df2.
它类似于此线程: 将单索引DataFrame复制到MultiIndex DataFrame
但是该解决方案仅适用于一个索引值,在这种情况下为索引"a".我想对df1的所有索引执行此操作.
But that solution only work for one index value, the index 'a' in that case. I want to do this operation for all index of df1.
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: import itertools
In [4]: inner = ('a','b')
In [5]: outer = ((10,20), (1,2))
In [6]: cols = ('one','two','three','four')
In [7]: sngl = pd.DataFrame(np.random.randn(2,4), index=inner, columns=cols)
In [8]: index_tups = list(itertools.product(*(outer + (inner,))))
In [9]: index_mult = pd.MultiIndex.from_tuples(index_tups)
In [10]: mult = pd.DataFrame(index=index_mult, columns=cols)
In [11]: sngl
Out[11]:
one two three four
a 2.946876 -0.751171 2.306766 0.323146
b 0.192558 0.928031 1.230475 -0.256739
In [12]: mult
Out[12]:
one two three four
10 1 a NaN NaN NaN NaN
b NaN NaN NaN NaN
2 a NaN NaN NaN NaN
b NaN NaN NaN NaN
20 1 a NaN NaN NaN NaN
b NaN NaN NaN NaN
2 a NaN NaN NaN NaN
b NaN NaN NaN NaN
In [13]: mult.ix[(10,1)] = sngl
In [14]: mult
Out[14]:
one two three four
10 1 a NaN NaN NaN NaN
b NaN NaN NaN NaN
2 a NaN NaN NaN NaN
b NaN NaN NaN NaN
20 1 a NaN NaN NaN NaN
b NaN NaN NaN NaN
2 a NaN NaN NaN NaN
b NaN NaN NaN NaN
@Jeff提供的解决方案是
The solution given by @Jeff is
nm = mult.reset_index().set_index('level_2')
nm.loc['a',sngl.columns] = sngl.loc['a'].values
level_0 level_1 one two three four
level_2
a 10 1 0.3738456 -0.2261926 -1.205177 0.08448757
b 10 1 NaN NaN NaN NaN
a 10 2 0.3738456 -0.2261926 -1.205177 0.08448757
b 10 2 NaN NaN NaN NaN
a 20 1 0.3738456 -0.2261926 -1.205177 0.08448757
b 20 1 NaN NaN NaN NaN
a 20 2 0.3738456 -0.2261926 -1.205177 0.08448757
b 20 2 NaN NaN NaN NaN
我不能这样做:
nm.loc[:,sngl.columns] = sngl.loc[:].values
这将引发ValueError:无法将大小为X的序列复制到维度为Y的数组轴上"
It will raise ValueError: "cannot copy sequence with size X to array axis with dimension Y"
我当前正在使用循环.但这不是大熊猫的方式.
I am currently using a loop. But this is not the pandas way.
推荐答案
这感觉有点手工,但是在实践中我可能会这样做:
This feels a little too manual, but in practice I might do something like this:
In [46]: mult[:] = sngl.loc[mult.index.get_level_values(2)].values
In [47]: mult
Out[47]:
one two three four
10 1 a 1.175042 0.044014 1.341404 -0.223872
b 0.216168 -0.748194 -0.546003 -0.501149
2 a 1.175042 0.044014 1.341404 -0.223872
b 0.216168 -0.748194 -0.546003 -0.501149
20 1 a 1.175042 0.044014 1.341404 -0.223872
b 0.216168 -0.748194 -0.546003 -0.501149
2 a 1.175042 0.044014 1.341404 -0.223872
b 0.216168 -0.748194 -0.546003 -0.501149
也就是说,首先选择我们要用于索引的元素:
That is, first select the elements we want to use to index:
In [64]: mult.index.get_level_values(2)
Out[64]: Index(['a', 'b', 'a', 'b', 'a', 'b', 'a', 'b'], dtype='object')
然后使用这些索引到sngl
:
In [65]: sngl.loc[mult.index.get_level_values(2)]
Out[65]:
one two three four
a 1.175042 0.044014 1.341404 -0.223872
b 0.216168 -0.748194 -0.546003 -0.501149
a 1.175042 0.044014 1.341404 -0.223872
b 0.216168 -0.748194 -0.546003 -0.501149
a 1.175042 0.044014 1.341404 -0.223872
b 0.216168 -0.748194 -0.546003 -0.501149
a 1.175042 0.044014 1.341404 -0.223872
b 0.216168 -0.748194 -0.546003 -0.501149
然后我们可以使用.values
丢弃索引信息,而只获取要填充的原始数组.
and then we can use .values
to throw away the indexing information and just get the raw array to fill with.
它不是很优雅,但是很简单.
It's not very elegant, but it's straightforward.
这篇关于如何将多索引数据框与单个索引数据框连接在一起?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!