pandas 中两个数据框之间的差异 [英] Diff between two dataframes in pandas
问题描述
我有两个数据框,它们都具有相同的基本架构. (4个日期字段,几个字符串字段和4-5个浮点字段).分别命名为df1
和df2
.
I have two dataframes both of which have the same basic schema. (4 date fields, a couple of string fields, and 4-5 float fields). Call them df1
and df2
.
我想要做的基本上是得到两者的差异"-在这里我得到两个数据框之间不共享的所有行(不在设置的交集中).请注意,两个数据帧的长度不必相同.
What I want to do is basically get a "diff" of the two - where I get back all rows that are not shared between the two dataframes (not in the set intersection). Note, the two dataframes need not be the same length.
我尝试使用pandas.merge(how='outer')
,但是我不确定将哪一列作为键"传递,因为实际上没有列,并且尝试的各种组合均无效. df1
或df2
可能具有两行(或更多行)相同的行.
I tried using pandas.merge(how='outer')
but I was not sure what column to pass in as the 'key' as there really isn't one and the various combinations I tried were not working. It is possible that df1
or df2
has two (or more) rows that are identical.
在pandas/Python中执行此操作的好方法是什么?
What is a good way to do this in pandas/Python?
推荐答案
尝试一下:
diff_df = pd.merge(df1, df2, how='outer', indicator='Exist')
diff_df = diff_df.loc[diff_df['Exist'] != 'both']
您将拥有一个数据框,其中包含df1和df2都不存在的所有行.
You will have a dataframe of all rows that don't exist on both df1 and df2.
这篇关于 pandas 中两个数据框之间的差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!