Python Pandas:支持25小时的日期时间索引 [英] Python Pandas: Supporting 25 hours in datetime index

查看:66
本文介绍了Python Pandas:支持25小时的日期时间索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用日期/时间作为熊猫中数据框的索引。

I want to use a date/time as an index for a dataframe in Pandas.

但是,数据库中的夏令时没有正确处理,因此日期/时间值夏令时结束的一天有25小时,并表示为:

However, daylight saving time is not properly addressed in the database, so the date/time values for the day in which daylight saving time ends have 25 hours and are represented as such:

2019102700
2019102701
...
2019102724

我正在使用以下代码将这些值转换为我用作熊猫数据框索引的 DateTime 对象:

I am using the following code to convert those values to a DateTime object that I use as an index to a Pandas dataframe:

df.index = pd.to_datetime(df["date_time"], format="%Y%m%d%H")

但是,这会产生错误:

ValueError: unconverted data remains: 4

大概是因为 to_datetime 函数预期小时不会是 24 。同样,夏令时开始的时间只有23小时。

Presumably because the to_datetime function is not expecting the hour to be 24. Similarly, the day in which daylight saving time starts only has 23 hours.

我想到的一个解决方案是将日期存储为字符串,但这似乎既不优雅也不高效。使用 to_datetime 有什么方法可以解决夏令时的问题吗?

One solution I thought of was storing the dates as strings, but that seems neither elegant nor efficient. Is there any way to solve the issue of handling daylight saving time when using to_datetime?

推荐答案

如果您知道时区,这是一种计算UTC时间戳的方法。仅解析日期部分,将数据属于本地数据定位到实际时区。并将其转换为UTC。现在您可以解析小时部分并将其添加为时间增量-例如

If you know the timezone, here's a way to calculate UTC timestamps. Parse only the date part, localize to the actual time zone the data "belongs" to, and convert that to UTC. Now you can parse the hour part and add it as a time delta - e.g.

import pandas as pd 

df = pd.DataFrame({'date_time_str': ['2019102722','2019102723','2019102724',
                                     '2019102800','2019102801','2019102802']})

df['date_time'] = (pd.to_datetime(df['date_time_str'].str[:-2], format='%Y%m%d')
                   .dt.tz_localize('Europe/Berlin')
                   .dt.tz_convert('UTC'))

df['date_time'] += df['date_time_str'].str[-2:].astype('timedelta64[h]')

# df['date_time']
# 0   2019-10-27 20:00:00+00:00
# 1   2019-10-27 21:00:00+00:00
# 2   2019-10-27 22:00:00+00:00
# 3   2019-10-27 23:00:00+00:00
# 4   2019-10-28 00:00:00+00:00
# 5   2019-10-28 01:00:00+00:00
# Name: date_time, dtype: datetime64[ns, UTC]

这篇关于Python Pandas:支持25小时的日期时间索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆