如何比较忽略列名的两个数据框? [英] How to compare two dataframes ignoring column names?

查看:50
本文介绍了如何比较忽略列名的两个数据框?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我要比较两个数据框的内容,而不是列名(或索引名).是否可以在不重命名列的情况下实现这一目标?

Suppose I want to compare the content of two dataframes, but not the column names (or index names). Is it possible to achieve this without renaming the columns?

例如:

df = pd.DataFrame({'A': [1,2], 'B':[3,4]})
df_equal = pd.DataFrame({'a': [1,2], 'b':[3,4]})
df_diff = pd.DataFrame({'A': [1,2], 'B':[3,5]})

在这种情况下, df df_equal ,但与 df_diff 不同,因为 df_equal 中的值具有相同的内容,但 df_diff 中的内容.注意, df_equal 中的列名不同 ,但我仍然想获得一个真实值.

In this case, df is df_equal but different to df_diff, because the values in df_equal has the same content, but the ones in df_diff. Notice that the column names in df_equal are different, but I still want to get a true value.

我尝试了以下操作:

等于:

# Returns false because of the column names
df.equals(df_equal)

eq:

# doesn't work as it compares four columns (A,B,a,b) assuming nulls for the one that doesn't exist
df.eq(df_equal).all().all()

pandas.testing.assert_frame_equal:

pandas.testing.assert_frame_equal:

# same as equals
pd.testing.assert_frame_equal(df, df_equal, check_names=False)

我认为可以使用 assert_frame_equal ,但是

I thought that it was going to be possible to use the assert_frame_equal, but none of the parameters seem to work to ignore column names.

推荐答案

pd.DataFrame 是围绕 pd.Series 构建的,因此不太可能执行没有列名的比较.

pd.DataFrame is built around pd.Series, so it's unlikely you will be able to perform comparisons without column names.

但最有效的方法是将其下拉至 numpy :

But the most efficient way would be to drop down to numpy:

assert_equal = (df.values == df_equal.values).all()

要处理 np.nan ,可以使用 np.testing.assert_equal 并捕获 AssertionError

To deal with np.nan, you can use np.testing.assert_equal and catch AssertionError, as suggested by @Avaris :

import numpy as np

def nan_equal(a,b):
    try:
        np.testing.assert_equal(a,b)
    except AssertionError:
        return False
    return True

assert_equal = nan_equal(df.values, df_equal.values)

这篇关于如何比较忽略列名的两个数据框?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆