Pandas DataFrames中的相等性-列顺序重要吗? [英] Equality in Pandas DataFrames - Column Order Matters?
问题描述
作为单元测试的一部分,我需要测试两个DataFrame的相等性. DataFrames中列的顺序对我来说并不重要.但是,这似乎对熊猫很重要:
As part of a unit test, I need to test two DataFrames for equality. The order of the columns in the DataFrames is not important to me. However, it seems to matter to Pandas:
import pandas
df1 = pandas.DataFrame(index = [1,2,3,4])
df2 = pandas.DataFrame(index = [1,2,3,4])
df1['A'] = [1,2,3,4]
df1['B'] = [2,3,4,5]
df2['B'] = [2,3,4,5]
df2['A'] = [1,2,3,4]
df1 == df2
结果:
Exception: Can only compare identically-labeled DataFrame objects
我相信表达式df1 == df2
应该计算为包含所有True
值的DataFrame.显然,在这种情况下==
的正确功能应该是有争议的.我的问题是:是否有我想要做的Pandas方法?也就是说,有没有一种方法可以进行相等比较而忽略列顺序?
I believe the expression df1 == df2
should evaluate to a DataFrame containing all True
values. Obviously it's debatable what the correct functionality of ==
should be in this context. My question is: Is there a Pandas method that does what I want? That is, is there a way to do equality comparison that ignores column order?
推荐答案
You could sort the columns using sort_index
:
df1.sort_index(axis=1) == df2.sort_index(axis=1)
这将得出所有True
值的数据框.
This will evaluate to a dataframe of all True
values.
正如@osa所说,这对于NaN来说还是不可行的,也不是特别健壮,在实践中可能建议使用类似于@quant的答案(注意:如果有问题,我们要布尔而不是加注):
As @osa comments this fails for NaN's and isn't particularly robust either, in practise using something similar to @quant's answer is probably recommended (Note: we want a bool rather than raise if there's an issue):
def my_equal(df1, df2):
from pandas.util.testing import assert_frame_equal
try:
assert_frame_equal(df1.sort_index(axis=1), df2.sort_index(axis=1), check_names=True)
return True
except (AssertionError, ValueError, TypeError): perhaps something else?
return False
这篇关于Pandas DataFrames中的相等性-列顺序重要吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!