pandas 中两个数据框之间的差异 [英] Diff between two dataframes in pandas

查看:76
本文介绍了 pandas 中两个数据框之间的差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个数据框,它们都具有相同的基本架构. (4个日期字段,几个字符串字段和4-5个浮点字段).分别命名为df1df2.

I have two dataframes both of which have the same basic schema. (4 date fields, a couple of string fields, and 4-5 float fields). Call them df1 and df2.

我想要做的基本上是得到两者的差异"-在这里我得到两个数据框之间不共享的所有行(不在设置的交集中).请注意,两个数据帧的长度不必相同.

What I want to do is basically get a "diff" of the two - where I get back all rows that are not shared between the two dataframes (not in the set intersection). Note, the two dataframes need not be the same length.

我尝试使用pandas.merge(how='outer'),但是我不确定将哪一列作为键"传递,因为实际上没有列,并且尝试的各种组合均无效. df1df2可能具有两行(或更多行)相同的行.

I tried using pandas.merge(how='outer') but I was not sure what column to pass in as the 'key' as there really isn't one and the various combinations I tried were not working. It is possible that df1 or df2 has two (or more) rows that are identical.

在pandas/Python中执行此操作的好方法是什么?

What is a good way to do this in pandas/Python?

推荐答案

尝试一下:

diff_df = pd.merge(df1, df2, how='outer', indicator='Exist')

diff_df = diff_df.loc[diff_df['Exist'] != 'both']

您将拥有一个数据框,其中包含df1和df2都不存在的所有行.

You will have a dataframe of all rows that don't exist on both df1 and df2.

这篇关于 pandas 中两个数据框之间的差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆