比较两个数据帧并获得差异 [英] Comparing two dataframes and getting the differences
问题描述
df1:
日期Fruit Num颜色
2013-11-24香蕉22.1黄色
2013-11-24橙色8.6橙色
2013-11-24苹果7.6绿色
2013-11-24芹菜10.2绿色
df2:
日期水果Num颜色
2013-11-24香蕉22.1黄色
2013-11-24橙色8.6橙色
2013-11-24苹果7.6绿色
2013-11-24芹菜10.2绿色
2013-11-25苹果22.1红色
2013-11-25橙色8.6橙色
每个数据框都将Date作为索引。两个数据帧都具有相同的结构。
我想做什么,是比较这两个数据框,并查找df2中不在df1中的哪些行。我想比较日期(索引)和第一列(香蕉,APple等),看看它们是否存在于df2和df1中。
我已经尝试过以下:
对于第一种方法,我得到这个错误: 异常:只能比较相同标记的DataFrame对象 。我已经尝试删除日期作为索引,但得到相同的错误。
在第三个方法,我得到断言返回False,但无法弄清楚如何实际看到不同的行。
任何指针都会受欢迎
这种方法, df1!= df2
仅适用于具有相同行和列的数据帧。事实上,所有的数据框轴都与 _indexed_same
方法进行比较,如果发现差异,即使是列/索引顺序,也会引发异常。
如果我让你对,你不想找到变化,但对称的差异。为此,一种方法可能是连接数据框架:
>>> df = pd.concat([df1,df2])
>>> df = df.reset_index(drop = True)
group by
>>> df_gpby = df.groupby(list(df.columns))
获取唯一记录的索引
>>> idx = [x [0] for x in df_gpby.groups.values()if len(x)== 1]
过滤器
>>> df.reindex(idx)
日期水果数字颜色
9 2013-11-25橙色8.6橙色
8 2013-11-25苹果22.1红色
I have two dataframes. Examples:
df1:
Date Fruit Num Color
2013-11-24 Banana 22.1 Yellow
2013-11-24 Orange 8.6 Orange
2013-11-24 Apple 7.6 Green
2013-11-24 Celery 10.2 Green
df2:
Date Fruit Num Color
2013-11-24 Banana 22.1 Yellow
2013-11-24 Orange 8.6 Orange
2013-11-24 Apple 7.6 Green
2013-11-24 Celery 10.2 Green
2013-11-25 Apple 22.1 Red
2013-11-25 Orange 8.6 Orange
Each dataframe has the Date as an index. Both dataframes have the same structure.
What i want to do, is compare these two dataframes and find which rows are in df2 that aren't in df1. I want to compare the date (index) and the first column (Banana, APple, etc) to see if they exist in df2 vs df1.
I have tried the following:
- Outputting difference in two pandas dataframes side by side - highlighting the difference
- Comparing two pandas dataframes for differences
For the first approach I get this error: "Exception: Can only compare identically-labeled DataFrame objects". I have tried removing the Date as index but get the same error.
On the third approach, I get the assert to return False but cannot figure out how to actually see the different rows.
Any pointers would be welcome
This approach, df1 != df2
, works only for dataframes with identical rows and columns. In fact, all dataframes axes are compared with _indexed_same
method, and exception is raised if differences found, even in columns/indices order.
If I got you right, you want not to find changes, but symmetric difference. For that, one approach might be concatenate dataframes:
>>> df = pd.concat([df1, df2])
>>> df = df.reset_index(drop=True)
group by
>>> df_gpby = df.groupby(list(df.columns))
get index of unique records
>>> idx = [x[0] for x in df_gpby.groups.values() if len(x) == 1]
filter
>>> df.reindex(idx)
Date Fruit Num Color
9 2013-11-25 Orange 8.6 Orange
8 2013-11-25 Apple 22.1 Red
这篇关于比较两个数据帧并获得差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!