比较两个 pandas 数据框的差异 [英] Comparing two pandas dataframes for differences
问题描述
我有一个脚本可以更新5到10列的数据,但有时起始csv与结束csv相同,因此我不想写相同的csvfile,而是希望它不执行任何操作...
I've got a script updating 5-10 columns worth of data , but sometimes the start csv will be identical to the end csv so instead of writing an identical csvfile I want it to do nothing...
如何比较两个数据框以检查它们是否相同?
How can I compare two dataframes to check if they're the same or not?
csvdata = pandas.read_csv('csvfile.csv')
csvdata_old = csvdata
# ... do stuff with csvdata dataframe
if csvdata_old != csvdata:
csvdata.to_csv('csvfile.csv', index=False)
有什么想法吗?
推荐答案
您还需要注意创建DataFrame的副本,否则csvdata_old将使用csvdata更新(因为它指向同一个对象):
You also need to be careful to create a copy of the DataFrame, otherwise the csvdata_old will be updated with csvdata (since it points to the same object):
csvdata_old = csvdata.copy()
要检查它们是否相等,您可以使用assert_frame_equal,如该答案所示:
To check whether they are equal, you can use assert_frame_equal as in this answer:
from pandas.util.testing import assert_frame_equal
assert_frame_equal(csvdata, csvdata_old)
您可以将其包装在函数中,例如:
You can wrap this in a function with something like:
try:
assert_frame_equal(csvdata, csvdata_old)
return True
except: # appeantly AssertionError doesn't catch all
return False
有人在讨论更好的方法...
这篇关于比较两个 pandas 数据框的差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!