如何使用NaT值正确处理整个DataFrame中的日期时间比较? [英] How to properly handle datetime comparisons in an entire DataFrame with NaT values?

查看:448
本文介绍了如何使用NaT值正确处理整个DataFrame中的日期时间比较?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当尝试检查 DataFrame 的值是否超过某个日期时,我偶然发现了这种奇怪的行为,而该DataFrame也可能包含 pd。 NaT

I stumbled upon this odd behavior when trying to check if a DataFrame has values above a certain date, while that DataFrame may also contain pd.NaT

比较值的行为符合预期:

Comparisons of values behaves as expected:

import pandas as pd

pd.NaT > pd.to_datetime('2018-10-15')
# False

与a的比较 Series 的行为也符合预期:

Comparisons with a Series also behave as expected:

s = pd.Series([pd.NaT, pd.to_datetime('2018-10-16')])
s > pd.to_datetime('2018-10-15')

#0    False
#1     True
#dtype: bool

但是 DataFrame 比较不正确:

s.to_frame() > pd.to_datetime('2018-10-15')
#      0
#0  True
#1  True

在我看来,问题在于比较最初返回的是 NaN ,它(在某个时候被强制)为 True 给出以下行为:

It seems to me the issue is that the comparison initially returns NaN which is (at some point?) coerced to True given the behavior of:

df = pd.DataFrame([[pd.NaT, pd.to_datetime('2018-10-16')],
                   [pd.to_datetime('2018-10-16'), pd.NaT]])

df >= pd.to_datetime('2018-10-15')
#      0     1
#0  True  True
#1  True  True

df.ge(pd.to_datetime('2018-10-15'))
#     0    1
#0  NaN  1.0
#1  1.0  NaN

所以我们真的不能使用> < > =< = 运算符在比较 DataFrame 时需要依赖 .lt .gt .le。 ge 后跟 .fillna(0)

So can we really not use the > < >= <= operators when comparing for a DataFrame and need to rely on .lt .gt .le .ge followed by a .fillna(0)?

df.ge(pd.to_datetime('2018-10-15')).fillna(0)
#     0    1
#0  0.0  1.0
#1  1.0  0.0


推荐答案

此错误将在下一版熊猫(0.24.0)中修复:

This was a bug that will be fixed in the next release of pandas (0.24.0):

In [1]: import pandas as pd; pd.__version__
Out[1]: '0.24.0.dev0+1504.g9642fea9c'

In [2]: s = pd.Series([pd.NaT, pd.to_datetime('2018-10-16')])

In [3]: s > pd.to_datetime('2018-10-15')
Out[3]:
0    False
1     True
dtype: bool

In [4]: s.to_frame() > pd.to_datetime('2018-10-15')
Out[4]:
       0
0  False
1   True

In [5]: df = pd.DataFrame([[pd.NaT, pd.to_datetime('2018-10-16')],
   ...:                    [pd.to_datetime('2018-10-16'), pd.NaT]])
   ...:

In [6]: df >= pd.to_datetime('2018-10-15')
Out[6]:
       0      1
0  False   True
1   True  False

In [7]: df.ge(pd.to_datetime('2018-10-15'))
Out[7]:
       0      1
0  False   True
1   True  False

有关相应的GitHub问题,请参见: https:/ /github.com/pandas-dev/pandas/issues/22242

For the corresponding GitHub issue, see: https://github.com/pandas-dev/pandas/issues/22242

这篇关于如何使用NaT值正确处理整个DataFrame中的日期时间比较?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆