比较两个 pandas 数据框的差异 [英] Comparing two pandas dataframes for differences

查看:35
本文介绍了比较两个 pandas 数据框的差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个脚本可以更新5到10列的数据,但有时起始csv与结束csv相同,因此我不想写相同的csvfile,而是希望它不执行任何操作...

I've got a script updating 5-10 columns worth of data , but sometimes the start csv will be identical to the end csv so instead of writing an identical csvfile I want it to do nothing...

如何比较两个数据框以检查它们是否相同?

How can I compare two dataframes to check if they're the same or not?

csvdata = pandas.read_csv('csvfile.csv')
csvdata_old = csvdata

# ... do stuff with csvdata dataframe

if csvdata_old != csvdata:
    csvdata.to_csv('csvfile.csv', index=False)

有什么想法吗?

推荐答案

您还需要注意创建DataFrame的副本,否则csvdata_old将使用csvdata更新(因为它指向同一个对象):

You also need to be careful to create a copy of the DataFrame, otherwise the csvdata_old will be updated with csvdata (since it points to the same object):

csvdata_old = csvdata.copy()

要检查它们是否相等,您可以使用assert_frame_equal,如该答案所示:

To check whether they are equal, you can use assert_frame_equal as in this answer:

from pandas.util.testing import assert_frame_equal
assert_frame_equal(csvdata, csvdata_old)

您可以将其包装在函数中,例如:

You can wrap this in a function with something like:

try:
    assert_frame_equal(csvdata, csvdata_old)
    return True
except:  # appeantly AssertionError doesn't catch all
    return False

有人在讨论更好的方法...

这篇关于比较两个 pandas 数据框的差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆