比较两个数据框并获取差异 [英] Comparing two dataframes and getting the differences

查看：45 发布时间：2021/12/3 8:44:37 python pandas dataframe

本文介绍了比较两个数据框并获取差异的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有两个数据框.示例:

df1:日期水果编号颜色2013-11-24香蕉22.1黄色2013-11-24 橙色 8.6 橙色2013-11-24 苹果 7.6 绿色2013-11-24 芹菜 10.2 绿色df2:日期水果编号颜色2013-11-24香蕉22.1黄色2013-11-24 橙色 8.6 橙色2013-11-24 苹果 7.6 绿色2013-11-24 芹菜 10.2 绿色2013-11-25 苹果22.1红2013-11-25 橙色 8.6 橙色

每个数据框都有日期作为索引.两个数据帧具有相同的结构.

我想要做的是比较这两个数据帧，并找出 df2 中哪些行不在 df1 中.我想比较日期(索引)和第一列(香蕉、苹果等)，看看它们是否存在于 df2 和 df1 中.

我尝试了以下方法:

对于第一种方法，我收到此错误:异常:只能比较标记相同的 DataFrame 对象".我尝试删除日期作为索引但得到相同的错误.

在第三种方法上，我得到断言返回 False 但无法弄清楚如何实际看到不同的行.

欢迎任何指点

解决方案

这种方法 df1 != df2 仅适用于具有相同行和列的数据框.事实上，所有数据帧轴都与 _indexed_same 方法进行比较，如果发现差异，即使在列/索引顺序中也会引发异常.

如果我猜对了，您不希望发现变化，而是发现对称差异.为此，一种方法可能是连接数据帧:

<预><代码>>>>df = pd.concat([df1, df2])>>>df = df.reset_index(drop=True)

分组

<预><代码>>>>df_gpby = df.groupby(list(df.columns))

获取唯一记录的索引

<预><代码>>>>idx = [x[0] for x in df_gpby.groups.values() if len(x) == 1]

过滤器

<预><代码>>>>df.reindex(idx)日期水果编号颜色9 2013-11-25 橙色 8.6 橙色8 2013-11-25 苹果 22.1 红色

I have two dataframes. Examples:

df1:
Date       Fruit  Num  Color 
2013-11-24 Banana 22.1 Yellow
2013-11-24 Orange  8.6 Orange
2013-11-24 Apple   7.6 Green
2013-11-24 Celery 10.2 Green

df2:
Date       Fruit  Num  Color 
2013-11-24 Banana 22.1 Yellow
2013-11-24 Orange  8.6 Orange
2013-11-24 Apple   7.6 Green
2013-11-24 Celery 10.2 Green
2013-11-25 Apple  22.1 Red
2013-11-25 Orange  8.6 Orange

Each dataframe has the Date as an index. Both dataframes have the same structure.

What i want to do, is compare these two dataframes and find which rows are in df2 that aren't in df1. I want to compare the date (index) and the first column (Banana, APple, etc) to see if they exist in df2 vs df1.

I have tried the following:

For the first approach I get this error: "Exception: Can only compare identically-labeled DataFrame objects". I have tried removing the Date as index but get the same error.

On the third approach, I get the assert to return False but cannot figure out how to actually see the different rows.

Any pointers would be welcome

解决方案

This approach, df1 != df2, works only for dataframes with identical rows and columns. In fact, all dataframes axes are compared with _indexed_same method, and exception is raised if differences found, even in columns/indices order.

If I got you right, you want not to find changes, but symmetric difference. For that, one approach might be concatenate dataframes:

>>> df = pd.concat([df1, df2])
>>> df = df.reset_index(drop=True)

group by

>>> df_gpby = df.groupby(list(df.columns))

get index of unique records

>>> idx = [x[0] for x in df_gpby.groups.values() if len(x) == 1]

filter

>>> df.reindex(idx)
         Date   Fruit   Num   Color
9  2013-11-25  Orange   8.6  Orange
8  2013-11-25   Apple  22.1     Red

这篇关于比较两个数据框并获取差异的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

比较两个数据框并获取差异 [英] Comparing two dataframes and getting the differences

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

比较两个数据框并获取差异 [英] Comparing two dataframes and getting the differences

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭