在Pandas数据框中以最少的日期填充缺失的日期值 [英] Filling missing date values with the least possible date in Pandas dataframe
问题描述
我有一个带有日期列的数据框,
I have a dataframe with a date column as,
df = pd.DataFrame({'date':['2014-10-01', np.nan, '2015-09-30', np.nan, np.nan, '2019-06-03']})
现在,我想用熊猫中的最小日期值来估算缺失的日期值.估算当前日期很容易datetime.now()
,但是对于一种特殊情况,我希望使用日期中尽可能少的值来估算NaN
值.
Now I want to impute the missing date values with the least possible date value in pandas. Imputing the current date is easy datetime.now()
but for one particular case I want the NaN
values to be imputed with the least possible value in dates.
现在 datetime
允许将最小日期设置为 '0001-01-01'
,但熊猫不能接受相同的日期.估算此值后,我得到的错误是
Now datetime
allows minimum date to be as '0001-01-01'
but the same is not acceptable with pandas. Upon imputing this value the error I get is
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1-01-01 00:00:00
我尝试查找stackoverflow,但找不到对熊猫中最小可接受日期的答案.
I tried looking up on stackoverflow but couldn't find a possible answer to minimum acceptable date in pandas.
有人知道吗?
我并不真正关心'OutOfBondsDatetime',我想知道大熊猫可以接受的最少日期.
I'm not really concerned with 'OutOfBondsDatetime', I'm curios to know the least possible date that pandas can accept.
推荐答案
如果您想要与熊猫很好地搭配的日期,则需要考虑pd.Timestamp
,因为这是熊猫可以使用的datetime
类型
If you want a date that plays nicely with pandas, you'll need to consider pd.Timestamp
, since this is the datetime
type that pandas works with.
如果您不介意日期中包含时间成分,请使用pd.Timestamp.min
:
If you don't mind your dates having a time component, use pd.Timestamp.min
:
pd.Timestamp.min
# Timestamp('1677-09-21 00:12:43.145225')
pd.to_datetime(df['date'].fillna(pd.Timestamp.min))
0 2014-10-01 00:00:00.000000
1 1677-09-21 00:12:43.145225
2 2015-09-30 00:00:00.000000
3 1677-09-21 00:12:43.145225
4 1677-09-21 00:12:43.145225
5 2019-06-03 00:00:00.000000
Name: date, dtype: datetime64[ns]
如果您只想要日期(无时间),则没有时间的最小日期为
If you only want the dates (without times), then the smallest date sans time component would be
pd.Timestamp.min.ceil('D')
# Timestamp('1677-09-22 00:00:00')
pd.to_datetime(df['date'].fillna(pd.Timestamp.min.ceil('D')))
0 2014-10-01
1 1677-09-22
2 2015-09-30
3 1677-09-22
4 1677-09-22
5 2019-06-03
Name: date, dtype: datetime64[ns]
这篇关于在Pandas数据框中以最少的日期填充缺失的日期值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!