检查pandas数据框是否为其他数据框的子集 [英] Check if pandas dataframe is subset of other dataframe
问题描述
我有两个Python Pandas数据框A,B,它们具有相同的列(显然具有不同的数据).我要检查A是B的子集,即A的所有行都包含在B中.
I have two Python Pandas dataframes A, B, with the same columns (obviously with different data). I want to check A is a subset of B, that is, all rows of A are contained in B.
任何想法怎么做?
推荐答案
方法 DataFrame.merge(another_DF)
默认情况下在列的交集上合并(使用两个DF中具有相同名称的所有列)并使用how='inner'
-因此我们希望inner join
(如果两个DF都不重复):
Method DataFrame.merge(another_DF)
merges on the intersection of the columns by default (uses all columns with same names from both DFs) and uses how='inner'
- so we expect to have the same # of rows after inner join
(if neither of DFs has duplicates):
len(A.merge(B)) == len(A)
如果其中一个DF重复行,则PS将无法正常工作-有关此类情况,请参见下文
PS it will NOT work properly if one of DFs have duplicated rows - see below for such cases
演示:
In [128]: A
Out[128]:
A B C
0 1 2 3
1 4 5 6
In [129]: B
Out[129]:
A B C
0 4 5 6
1 1 2 3
2 9 8 7
In [130]: len(A.merge(B)) == len(A)
Out[130]: True
对于包含重复项的数据集,我们可以删除重复项并使用相同的方法:
for data sets containing duplicates, we can remove duplicates and use the same method:
In [136]: A
Out[136]:
A B C
0 1 2 3
1 4 5 6
2 1 2 3
In [137]: B
Out[137]:
A B C
0 4 5 6
1 1 2 3
2 9 8 7
3 4 5 6
In [138]: A.merge(B).drop_duplicates()
Out[138]:
A B C
0 1 2 3
2 4 5 6
In [139]: len(A.merge(B).drop_duplicates()) == len(A.drop_duplicates())
Out[139]: True
这篇关于检查pandas数据框是否为其他数据框的子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!