在 pandas 中查询MultiIndex DataFrame [英] Querying MultiIndex DataFrame in Pandas
问题描述
我有一个看起来像这样的DataFrame:
I have a DataFrame that looks like this:
FirstDF=
C
A B
'a' 'blue' 43
'green' 59
'b' 'red 56
'c' 'green' 80
'orange' 72
其中A和B设置为索引.我也有一个看起来像这样的DataFrame:
Where A and B are set as indexes. I also have a DataFrame that looks like:
SecondDF=
A B
0 'a' 'green'
1 'b' 'red'
2 'c' 'green'
有没有一种方法可以直接查询第一个DataFrame和最后一个DataFrame,并获得如下所示的输出?
Is there a way I can directly query the first DataFrame with the last one, and obtain an output like the following?
C
59
56
80
我通过遍历第二个DataFrame来做到这一点,如下所示,但是我想使用pandas逻辑而不是for循环来实现.
I did it by iterating over the second DataFrame, as shown below, but I would like to do it using pandas logic instead of for loops.
data=[]
for i in range(SecondDF.shape[0]):
data.append(FirstDF.loc[tuple(SecondDF.iloc[i])])
data=pd.Series(data)
推荐答案
使用 merge
,参数为left_index
和right_on
:
df = FirstDF.merge(SecondDF, left_index=True, right_on=['A','B'])['C'].to_frame()
print (df)
C
0 59
1 56
2 80
使用 isin
MultiIndex
es,并通过 boolean indexing
过滤:
Another solution with isin
of MultiIndex
es and filtering by boolean indexing
:
mask = FirstDF.index.isin(SecondDF.set_index(['A','B']).index)
#alternative solution
#mask = FirstDF.index.isin(list(map(tuple,SecondDF[['A','B']].values.tolist())))
df = FirstDF.loc[mask, ['C']].reset_index(drop=True)
print (df)
C
0 59
1 56
2 80
详细信息:
print (FirstDF.loc[mask, ['C']])
C
A B
'a' 'green' 59
'b' 'red' 56
'c' 'green' 80
您可以将merge
与外部联接和indicator=True
参数一起使用,然后通过 boolean indexing
:
You can use merge
with outer join and indicator=True
parameter, then filter by boolean indexing
:
df1=FirstDF.merge(SecondDF, left_index=True, right_on=['A','B'], indicator=True, how='outer')
print (df1)
C A B _merge
2 43 'a' 'blue' left_only
0 59 'a' 'green' both
1 56 'b' 'red' both
2 80 'c' 'green' both
2 72 'c' 'orange' left_only
mask = df1['_merge'] != 'both'
df1 = df1.loc[mask, ['C']].reset_index(drop=True)
print (df1)
C
0 43
1 72
对于第二个解决方案,按~
反转布尔值掩码:
For second solution invert boolen mask by ~
:
mask = FirstDF.index.isin(SecondDF.set_index(['A','B']).index)
#alternative solution
#mask = FirstDF.index.isin(list(map(tuple,SecondDF[['A','B']].values.tolist())))
df = FirstDF.loc[~mask, ['C']].reset_index(drop=True)
print (df)
C
0 43
1 72
这篇关于在 pandas 中查询MultiIndex DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!