如何在多列中使用pandas isin [英] how to use pandas isin for multiple columns
问题描述
我想找到col1
和col2
的值,其中第一个数据帧的col1
和col2
都在第二个数据帧中.
I want to find the values of col1
and col2
where the col1
and col2
of the first dataframe are both in the second dataframe.
这些行应位于结果数据框中:
These rows should be in the result dataframe:
-
比萨饼,男孩
pizza, boy
比萨饼,女孩
冰淇淋,男孩
因为所有三行都在第一个和第二个数据帧中.
because all three rows are in the first and second dataframes.
我怎么可能做到这一点?我当时想使用isin
,但是当我不得不考虑不止一列时,我不确定如何使用它.
How do I possibly accomplish this? I was thinking of using isin
, but I am not sure how to use it when I have to consider more than one column.
推荐答案
Perform an inner merge on col1
and col2
:
import pandas as pd
df1 = pd.DataFrame({'col1': ['pizza', 'hamburger', 'hamburger', 'pizza', 'ice cream'], 'col2': ['boy', 'boy', 'girl', 'girl', 'boy']}, index=range(1,6))
df2 = pd.DataFrame({'col1': ['pizza', 'pizza', 'chicken', 'cake', 'cake', 'chicken', 'ice cream'], 'col2': ['boy', 'girl', 'girl', 'boy', 'girl', 'boy', 'boy']}, index=range(10,17))
print(pd.merge(df2.reset_index(), df1, how='inner').set_index('index'))
收益
col1 col2
index
10 pizza boy
11 pizza girl
16 ice cream boy
reset_index
和set_index
调用的目的是保留df2
的索引,就像您发布的所需结果一样.如果索引不重要,则
The purpose of the reset_index
and set_index
calls are to preserve df2
's index as in the desired result you posted. If the index is not important, then
pd.merge(df2, df1, how='inner')
# col1 col2
# 0 pizza boy
# 1 pizza girl
# 2 ice cream boy
足够了.
或者,您可以在其中构建 MultiIndex
s col1
和col2
列,然后调用 MultiIndex.isin
方法:
Alternatively, you could construct MultiIndex
s out of the col1
and col2
columns, and then call the MultiIndex.isin
method:
index1 = pd.MultiIndex.from_arrays([df1[col] for col in ['col1', 'col2']])
index2 = pd.MultiIndex.from_arrays([df2[col] for col in ['col1', 'col2']])
print(df2.loc[index2.isin(index1)])
收益
col1 col2
10 pizza boy
11 pizza girl
16 ice cream boy
这篇关于如何在多列中使用pandas isin的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!