如何从另一个数据框中删除 pandas 数据框 [英] How to remove a pandas dataframe from another dataframe
问题描述
如何从另一个数据框中删除熊猫数据框,就像设置的减法一样:
How to remove a pandas dataframe from another dataframe, just like the set subtraction:
a=[1,2,3,4,5]
b=[1,5]
a-b=[2,3,4]
现在我们有两个熊猫数据框,如何从df1中删除df2:
And now we have two pandas dataframe, how to remove df2 from df1:
In [5]: df1=pd.DataFrame([[1,2],[3,4],[5,6]],columns=['a','b'])
In [6]: df1
Out[6]:
a b
0 1 2
1 3 4
2 5 6
In [9]: df2=pd.DataFrame([[1,2],[5,6]],columns=['a','b'])
In [10]: df2
Out[10]:
a b
0 1 2
1 5 6
那么我们期望df1-df2的结果将是:
Then we expect df1-df2 result will be:
In [14]: df
Out[14]:
a b
0 3 4
该怎么做?
谢谢.
推荐答案
解决方案
使用pd.concat
,然后使用drop_duplicates(keep=False)
pd.concat([df1, df2, df2]).drop_duplicates(keep=False)
看起来像
a b
1 3 4
说明
pd.concat
将两个DataFrame
加在一起,方法是将一个紧接在另一个后面.如果有任何重叠,则将通过drop_duplicates
方法捕获它.但是,默认情况下drop_duplicates
会保留第一个观察值,并删除所有其他观察值.在这种情况下,我们希望删除所有重复项.因此,keep=False
参数可以做到这一点.
Explanation
pd.concat
adds the two DataFrame
s together by appending one right after the other. if there is any overlap, it will be captured by the drop_duplicates
method. However, drop_duplicates
by default leaves the first observation and removes every other observation. In this case, we want every duplicate removed. Hence, the keep=False
parameter which does exactly that.
对重复的df2
的特殊说明.只有一个df2
,df2
中的任何行都不会被视为重复,而是会保留.仅当df2
是df1
的子集时,只有一个df2
的此解决方案才有效.但是,如果我们两次连接df2
,则可以保证它是重复的,随后将被删除.
A special note to the repeated df2
. With only one df2
any row in df2
not in df1
won't be considered a duplicate and will remain. This solution with only one df2
only works when df2
is a subset of df1
. However, if we concat df2
twice, it is guaranteed to be a duplicate and will subsequently be removed.
这篇关于如何从另一个数据框中删除 pandas 数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!