如何比较忽略列名的两个数据框? [英] How to compare two dataframes ignoring column names?
问题描述
假设我要比较两个数据框的内容,而不是列名(或索引名).是否可以在不重命名列的情况下实现这一目标?
Suppose I want to compare the content of two dataframes, but not the column names (or index names). Is it possible to achieve this without renaming the columns?
例如:
df = pd.DataFrame({'A': [1,2], 'B':[3,4]})
df_equal = pd.DataFrame({'a': [1,2], 'b':[3,4]})
df_diff = pd.DataFrame({'A': [1,2], 'B':[3,5]})
在这种情况下, df
是 df_equal
,但与 df_diff
不同,因为 df_equal
中的值具有相同的内容,但 df_diff
中的内容.注意, df_equal
中的列名不同 ,但我仍然想获得一个真实值.
In this case, df
is df_equal
but different to df_diff
, because the values in df_equal
has the same content, but the ones in df_diff
. Notice that the column names in df_equal
are different, but I still want to get a true value.
我尝试了以下操作:
等于:
# Returns false because of the column names
df.equals(df_equal)
eq:
# doesn't work as it compares four columns (A,B,a,b) assuming nulls for the one that doesn't exist
df.eq(df_equal).all().all()
pandas.testing.assert_frame_equal:
pandas.testing.assert_frame_equal:
# same as equals
pd.testing.assert_frame_equal(df, df_equal, check_names=False)
我认为可以使用 assert_frame_equal
,但是
I thought that it was going to be possible to use the assert_frame_equal
, but none of the parameters seem to work to ignore column names.
推荐答案
pd.DataFrame
是围绕 pd.Series
构建的,因此不太可能执行没有列名的比较.
pd.DataFrame
is built around pd.Series
, so it's unlikely you will be able to perform comparisons without column names.
但最有效的方法是将其下拉至 numpy
:
But the most efficient way would be to drop down to numpy
:
assert_equal = (df.values == df_equal.values).all()
要处理 np.nan
,可以使用 np.testing.assert_equal
并捕获 AssertionError
,
To deal with np.nan
, you can use np.testing.assert_equal
and catch AssertionError
, as suggested by @Avaris :
import numpy as np
def nan_equal(a,b):
try:
np.testing.assert_equal(a,b)
except AssertionError:
return False
return True
assert_equal = nan_equal(df.values, df_equal.values)
这篇关于如何比较忽略列名的两个数据框?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!