在Pandas Date-Time列中标记夏令时(DST)小时 [英] Flag Daylight Saving Time (DST) Hours in Pandas Date-Time Column

查看:119
本文介绍了在Pandas Date-Time列中标记夏令时(DST)小时的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我创建了一个小时日期数据框,现在我想创建一列来标记每行(小时)是否处于夏令时.例如,在夏季,标志应== 1,在冬季,标志应==0.

I created an hourly dates dataframe, and now I would like to create a column that flags whether each row (hour) is in Daylight Saving Time or not. For example, in summer hours, the flag should == 1, and in winter hours, the flag should == 0.

# Localized dates dataframe
dates = pd.DataFrame(data=pd.date_range('2018-1-1', '2019-1-1', freq='h', tz='America/Denver'), columns=['date_time'])

# My failed attempt to create the flag column
dates['dst_flag'] = np.where(dates['date_time'].dt.daylight_saving_time == True, 1, 0)

推荐答案

注释中有一个不错的链接,至少可以让您手动执行此操作. AFAIK,没有矢量化的方法可以做到这一点.

There's a nice link in the comments that at least let you do this manually. AFAIK, there isn't a vectorized way to do this.

import pandas as pd
import numpy as np
from pytz import timezone

# Generate data (as opposed to index)                                                                                                                                                                                  
date_range = pd.to_datetime(pd.date_range('1/1/2018', '1/1/2019', freq='h', tz='America/Denver'))
date_range = [date for date in date_range]

# Localized dates dataframe                                                                                                                                                           
df = pd.DataFrame(data=date_range, columns=['date_time'])

# Map transition times to year for some efficiency gain                                                                                                                                                     
tz = timezone('America/Denver')
transition_times = tz._utc_transition_times[1:]
transition_times = [t.astimezone(tz) for t in transition_times]
transition_times_by_year = {}
for start_time, stop_time in zip(transition_times[::2], transition_times[1::2]):
    year = start_time.year
    transition_times_by_year[year] = [start_time, stop_time]

# If the date is in DST, mark true, else false                                                                                                                                                              
def mark_dst(dates):
    for date in dates:
        start_dst, stop_dst = transition_times_by_year[date.year]
        yield start_dst <= date <= stop_dst
df['dst_flag'] = [dst_flag for dst_flag in mark_dst(df['date_time'])]

# Do a quick sanity check to make sure we did this correctly for year 2018                                                                                                                                  
dst_start = df[df['dst_flag'] == True]['date_time'][0] # First dst time 2018
dst_end = df[df['dst_flag'] == True]['date_time'][-1] # Last dst time 2018
print(dst_start)
print(dst_end)

此输出:

2018-03-11 07:00:00-06:00
2018-11-04 06:00:00-07:00

这可能是正确的.我没有手动进行UTC转换,也没有进行任何检查以确保小时数在给定的时区中正确无误.您至少可以通过快速的Google搜索确认日期是否正确.

which is likely correct. I didn't do the UTC conversions by hand or anything to check that the hours are exactly right for the given timezone. You can at least verify the dates are correct with a quick google search.

  1. pd.date_range生成一个索引,而不是数据.我稍微更改了原始代码,使其成为数据而不是索引.我假设您已经有了数据.

  1. pd.date_range generates an index, not data. I changed your original code slightly to make it be data as opposed to the index. I assume you have the data already.

tz._utc_transition_times的结构有些愚蠢.它是utc DST的开始/停止时间,但是在早期会有一些愚蠢的事情.从1965年开始,情况应该会很好.如果您的日期早于该日期,则将tz._utc_transition_times[1:]更改为tz._utc_transition_times.请注意,并非1965年以前都存在.

There's something goofy about how tz._utc_transition_times is structured. It's start/stop utc DST transition times, but there is some goofy stuff in the early dates. It should be good from 1965 onward though. If you are doing dates earlier than that change tz._utc_transition_times[1:] to tz._utc_transition_times. Note not all years before 1965 are present.

tz._utc_transition_times是"Python专用的".可能会进行更改而不会发出警告或通知,并且可能会或可能不会对pytz的将来版本或过去的版本起作用.我正在使用pytz verion 2017.3.我建议您运行此代码以确保输出匹配,如果不匹配,请确保使用版本2017.3.

tz._utc_transition_times is "Python private". It is liable to change without warning or notice, and may or may not work for future or past versions of pytz. I'm using pytz verion 2017.3. I recommend you run this code to make sure the output matches, and if not, make sure to use version 2017.3.

HTH,祝您研究/回归问题顺利!

HTH, good luck with your research/regression problem!

这篇关于在Pandas Date-Time列中标记夏令时(DST)小时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆