在Pandas数据框中以最少的日期填充缺失的日期值 [英] Filling missing date values with the least possible date in Pandas dataframe

查看:161
本文介绍了在Pandas数据框中以最少的日期填充缺失的日期值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有日期列的数据框,

I have a dataframe with a date column as,

df = pd.DataFrame({'date':['2014-10-01', np.nan, '2015-09-30', np.nan, np.nan, '2019-06-03']})

现在,我想用熊猫中的最小日期值来估算缺失的日期值.估算当前日期很容易datetime.now(),但是对于一种特殊情况,我希望使用日期中尽可能少的值来估算NaN值.

Now I want to impute the missing date values with the least possible date value in pandas. Imputing the current date is easy datetime.now() but for one particular case I want the NaN values to be imputed with the least possible value in dates.

现在 datetime 允许将最小日期设置为 '0001-01-01' ,但熊猫不能接受相同的日期.估算此值后,我得到的错误是

Now datetime allows minimum date to be as '0001-01-01' but the same is not acceptable with pandas. Upon imputing this value the error I get is

OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1-01-01 00:00:00

我尝试查找stackoverflow,但找不到对熊猫中最小可接受日期的答案.

I tried looking up on stackoverflow but couldn't find a possible answer to minimum acceptable date in pandas.

有人知道吗?

我并不真正关心'OutOfBondsDatetime',我想知道大熊猫可以接受的最少日期.

I'm not really concerned with 'OutOfBondsDatetime', I'm curios to know the least possible date that pandas can accept.

推荐答案

如果您想要与熊猫很好地搭配的日期,则需要考虑pd.Timestamp,因为这是熊猫可以使用的datetime类型

If you want a date that plays nicely with pandas, you'll need to consider pd.Timestamp, since this is the datetime type that pandas works with.

如果您不介意日期中包含时间成分,请使用pd.Timestamp.min:

If you don't mind your dates having a time component, use pd.Timestamp.min:

pd.Timestamp.min
# Timestamp('1677-09-21 00:12:43.145225')

pd.to_datetime(df['date'].fillna(pd.Timestamp.min))

0   2014-10-01 00:00:00.000000
1   1677-09-21 00:12:43.145225
2   2015-09-30 00:00:00.000000
3   1677-09-21 00:12:43.145225
4   1677-09-21 00:12:43.145225
5   2019-06-03 00:00:00.000000
Name: date, dtype: datetime64[ns]

如果您只想要日期(无时间),则没有时间的最小日期为

If you only want the dates (without times), then the smallest date sans time component would be

pd.Timestamp.min.ceil('D')
# Timestamp('1677-09-22 00:00:00')

pd.to_datetime(df['date'].fillna(pd.Timestamp.min.ceil('D')))

0   2014-10-01
1   1677-09-22
2   2015-09-30
3   1677-09-22
4   1677-09-22
5   2019-06-03
Name: date, dtype: datetime64[ns]

这篇关于在Pandas数据框中以最少的日期填充缺失的日期值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆