Python Pandas - 找出两个数据框之间的差异 [英] Python Pandas - Find difference between two data frames
问题描述
我有两个数据框 df1 和 df2,其中 df2 是 df1 的子集.我如何获得一个新的数据帧 (df3),这是两个数据帧之间的差异?
换句话说,一个包含 df1 中所有不在 df2 中的行/列的数据框?
通过使用 drop_duplicates
pd.concat([df1,df2]).drop_duplicates(keep=False)
更新:
上述方法仅适用于那些本身还没有重复项的数据框.例如:
df1=pd.DataFrame({'A':[1,2,3,3],'B':[2,3,4,4]})df2=pd.DataFrame({'A':[1],'B':[2]})
它会输出如下,这是错误的
<块引用>错误输出:
pd.concat([df1, df2]).drop_duplicates(keep=False)出[655]:甲乙1 2 3
<块引用>
正确的输出
输出[656]:甲乙1 2 32 3 43 3 4
<块引用>
如何实现?
方法一:在tuple
isin
df1[~df1.apply(tuple,1).isin(df2.apply(tuple,1))]出[657]:甲乙1 2 32 3 43 3 4
方法二:merge
与indicator
df1.merge(df2,indicator = True, how='left').loc[lambda x : x['_merge']!='both']出[421]:A B _合并1 2 3 left_only2 3 4 left_only3 3 4 left_only
I have two data frames df1 and df2, where df2 is a subset of df1. How do I get a new data frame (df3) which is the difference between the two data frames?
In other word, a data frame that has all the rows/columns in df1 that are not in df2?
By using drop_duplicates
pd.concat([df1,df2]).drop_duplicates(keep=False)
Update :
The above method only works for those data frames that don't already have duplicates themselves. For example:
df1=pd.DataFrame({'A':[1,2,3,3],'B':[2,3,4,4]})
df2=pd.DataFrame({'A':[1],'B':[2]})
It will output like below , which is wrong
Wrong Output :
pd.concat([df1, df2]).drop_duplicates(keep=False)
Out[655]:
A B
1 2 3
Correct Output
Out[656]:
A B
1 2 3
2 3 4
3 3 4
How to achieve that?
Method 1: Using isin
with tuple
df1[~df1.apply(tuple,1).isin(df2.apply(tuple,1))]
Out[657]:
A B
1 2 3
2 3 4
3 3 4
Method 2: merge
with indicator
df1.merge(df2,indicator = True, how='left').loc[lambda x : x['_merge']!='both']
Out[421]:
A B _merge
1 2 3 left_only
2 3 4 left_only
3 3 4 left_only
这篇关于Python Pandas - 找出两个数据框之间的差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!