具有NaNs相等性的Pandas DataFrames比较 [英] Pandas DataFrames with NaNs equality comparison

查看:135
本文介绍了具有NaNs相等性的Pandas DataFrames比较的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在对某些功能进行单元测试的情况下,我正在尝试使用python pandas建立2个DataFrame的相等性:

In the context of unit testing some functions, I'm trying to establish the equality of 2 DataFrames using python pandas:

ipdb> expect
                            1   2
2012-01-01 00:00:00+00:00 NaN   3
2013-05-14 12:00:00+00:00   3 NaN

ipdb> df
identifier                  1   2
timestamp
2012-01-01 00:00:00+00:00 NaN   3
2013-05-14 12:00:00+00:00   3 NaN

ipdb> df[1][0]
nan

ipdb> df[1][0], expect[1][0]
(nan, nan)

ipdb> df[1][0] == expect[1][0]
False

ipdb> df[1][1] == expect[1][1]
True

ipdb> type(df[1][0])
<type 'numpy.float64'>

ipdb> type(expect[1][0])
<type 'numpy.float64'>

ipdb> (list(df[1]), list(expect[1]))
([nan, 3.0], [nan, 3.0])

ipdb> df1, df2 = (list(df[1]), list(expect[1])) ;; df1 == df2
False

鉴于我要针对整个df(包括NaN职位)测试整个expect,我在做什么错了?

Given that I'm trying to test the entire of expect against the entire of df, including NaN positions, what am I doing wrong?

比较包含NaN的Series/DataFrames相等性的最简单方法是什么?

What is the simplest way to compare equality of Series/DataFrames including NaNs?

推荐答案

您可以将assert_frame_equals与check_names = False一起使用(以免检查索引/列名称),如果它们不相等,则会出现此错误: >

You can use assert_frame_equals with check_names=False (so as not to check the index/columns names), which will raise if they are not equal:

In [11]: from pandas.testing import assert_frame_equal

In [12]: assert_frame_equal(df, expected, check_names=False)

您可以将其包装在函数中,例如:

You can wrap this in a function with something like:

try:
    assert_frame_equal(df, expected, check_names=False)
    return True
except AssertionError:
    return False


在最近的熊猫中,此功能已添加为 .equals :


In more recent pandas this functionality has been added as .equals:

df.equals(expected)

这篇关于具有NaNs相等性的Pandas DataFrames比较的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆