Pandas DataFrames中的相等性-列顺序重要吗? [英] Equality in Pandas DataFrames - Column Order Matters?

查看:71
本文介绍了Pandas DataFrames中的相等性-列顺序重要吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

作为单元测试的一部分,我需要测试两个DataFrame的相等性. DataFrames中列的顺序对我来说并不重要.但是,这似乎对熊猫很重要:

As part of a unit test, I need to test two DataFrames for equality. The order of the columns in the DataFrames is not important to me. However, it seems to matter to Pandas:

import pandas
df1 = pandas.DataFrame(index = [1,2,3,4])
df2 = pandas.DataFrame(index = [1,2,3,4])
df1['A'] = [1,2,3,4]
df1['B'] = [2,3,4,5]
df2['B'] = [2,3,4,5]
df2['A'] = [1,2,3,4]
df1 == df2

结果:

Exception: Can only compare identically-labeled DataFrame objects

我相信表达式df1 == df2应该计算为包含所有True值的DataFrame.显然,在这种情况下==的正确功能应该是有争议的.我的问题是:是否有我想要做的Pandas方法?也就是说,有没有一种方法可以进行相等比较而忽略列顺序?

I believe the expression df1 == df2 should evaluate to a DataFrame containing all True values. Obviously it's debatable what the correct functionality of == should be in this context. My question is: Is there a Pandas method that does what I want? That is, is there a way to do equality comparison that ignores column order?

推荐答案

您可以使用

You could sort the columns using sort_index:

df1.sort_index(axis=1) == df2.sort_index(axis=1)

这将得出所有True值的数据框.

This will evaluate to a dataframe of all True values.

正如@osa所说,这对于NaN来说还是不可行的,也不是特别健壮,在实践中可能建议使用类似于@quant的答案(注意:如果有问题,我们要布尔而不是加注):

As @osa comments this fails for NaN's and isn't particularly robust either, in practise using something similar to @quant's answer is probably recommended (Note: we want a bool rather than raise if there's an issue):

def my_equal(df1, df2):
    from pandas.util.testing import assert_frame_equal
    try:
        assert_frame_equal(df1.sort_index(axis=1), df2.sort_index(axis=1), check_names=True)
        return True
    except (AssertionError, ValueError, TypeError):  perhaps something else?
        return False

这篇关于Pandas DataFrames中的相等性-列顺序重要吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆