pandas.to_datetime 给出了 OutOfBoundsDatetime 错误 [英] pandas.to_datetime gives OutOfBoundsDatetime Error
问题描述
我有某种格式的数据,我想将其读入 pandas.DataFrame.有些行给我一个错误.下面是这些字符串之一的最小示例,但我有几个它不起作用的地方(奇怪的是,有些地方它起作用了).
I have data in some format which I want to read into a pandas.DataFrame. Some rows give me an error. Below is a minimal example for one of those strings, but i have several where it does not work (and strangely enough some where it does work).
确切的错误是:
OutOfBoundsDatetime,越界纳秒时间戳:2276-02-1805:15:13
OutOfBoundsDatetime, Out of bounds nanosecond timestamp: 2276-02-18 05:15:13
import pandas as pd
pd.to_datetime('02/18/2276 5:15:13 AM', format='%m/%d/%Y %I:%M:%S %p')
我使用这个网站来制作我的格式字符串:https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Period.strftime.html
I used this site to make my format-string: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Period.strftime.html
感谢您的帮助!
推荐答案
这是越界的,因为 datetime
dtype 是 datetime64[ns]
它有一个上限2262 年的限制见 docs 如果您将分辨率更改为较低的分辨率,然后它可以处理此日期时间,但不幸的是,您无法在 pandas
中执行此操作.由于 datetime
被本地存储为 datetime64[ns]
,您必须在 numpy 中或使用正常的日期时间来执行此操作.
This is out of bounds because the datetime
dtype is datetime64[ns]
which has an upper bound limit of year 2262 see the docs if you change the resolution to a lower resolution then it can handle this datetime but you can't do this within pandas
unfortunately. As datetime
s are stored natively as datetime64[ns]
, you'd have to do this within numpy or using a normal datetime.
另一种方法是,如果年份超出范围,则将年份存储在单独的列中,并将年份值设置为 1900
或其他指示年份超出范围的指示符.
Another method is to store the year in a separate column if it's outside of the bounds and set the year value to 1900
or some other indicator that the year is out of bounds.
但是,这会带来性能问题,因为您丢失了一些矢量化操作
However, this has performance issues as you lost some vectorised operations
这篇关于pandas.to_datetime 给出了 OutOfBoundsDatetime 错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!