在 pandas 中查询MultiIndex DataFrame [英] Querying MultiIndex DataFrame in Pandas

查看:77
本文介绍了在 pandas 中查询MultiIndex DataFrame的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的DataFrame:

I have a DataFrame that looks like this:

FirstDF=
              C
A    B      
'a' 'blue'   43
    'green'  59
'b' 'red     56
'c' 'green'  80
    'orange' 72

其中A和B设置为索引.我也有一个看起来像这样的DataFrame:

Where A and B are set as indexes. I also have a DataFrame that looks like:

SecondDF=

    A     B
0  'a'  'green'
1  'b'  'red'
2  'c'  'green'

有没有一种方法可以直接查询第一个DataFrame和最后一个DataFrame,并获得如下所示的输出?

Is there a way I can directly query the first DataFrame with the last one, and obtain an output like the following?

C
59
56
80

我通过遍历第二个DataFrame来做到这一点,如下所示,但是我想使用pandas逻辑而不是for循环来实现.

I did it by iterating over the second DataFrame, as shown below, but I would like to do it using pandas logic instead of for loops.

data=[]
for i in range(SecondDF.shape[0]):
    data.append(FirstDF.loc[tuple(SecondDF.iloc[i])])
data=pd.Series(data)

推荐答案

使用 merge ,参数为left_indexright_on:

df = FirstDF.merge(SecondDF, left_index=True, right_on=['A','B'])['C'].to_frame()
print (df)
    C
0  59
1  56
2  80

使用 isin MultiIndex es,并通过 boolean indexing 过滤:

Another solution with isin of MultiIndexes and filtering by boolean indexing:

mask = FirstDF.index.isin(SecondDF.set_index(['A','B']).index)
#alternative solution
#mask = FirstDF.index.isin(list(map(tuple,SecondDF[['A','B']].values.tolist())))
df = FirstDF.loc[mask, ['C']].reset_index(drop=True)
print (df)
    C
0  59
1  56
2  80

详细信息:

print (FirstDF.loc[mask, ['C']])
              C
A   B          
'a' 'green'  59
'b' 'red'    56
'c' 'green'  80

您可以将merge与外部联接和indicator=True参数一起使用,然后通过 boolean indexing :

You can use merge with outer join and indicator=True parameter, then filter by boolean indexing:

df1=FirstDF.merge(SecondDF, left_index=True, right_on=['A','B'], indicator=True, how='outer')
print (df1)
    C    A         B     _merge
2  43  'a'    'blue'  left_only
0  59  'a'   'green'       both
1  56  'b'     'red'       both
2  80  'c'   'green'       both
2  72  'c'  'orange'  left_only

mask = df1['_merge'] != 'both'
df1 = df1.loc[mask, ['C']].reset_index(drop=True)
print (df1)
    C
0  43
1  72


对于第二个解决方案,按~反转布尔值掩码:


For second solution invert boolen mask by ~:

mask = FirstDF.index.isin(SecondDF.set_index(['A','B']).index)
#alternative solution
#mask = FirstDF.index.isin(list(map(tuple,SecondDF[['A','B']].values.tolist())))
df = FirstDF.loc[~mask, ['C']].reset_index(drop=True)
print (df)
    C
0  43
1  72

这篇关于在 pandas 中查询MultiIndex DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆