Python pandas tz_localize抛出NonExistentTimeError,然后无法丢弃错误的时间 [英] Python pandas tz_localize throws NonExistentTimeError, then unable to drop erroneous times

查看:122
本文介绍了Python pandas tz_localize抛出NonExistentTimeError,然后无法丢弃错误的时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在python熊猫中,我有一个像这样的数据集:

In python pandas, I have a dataset that looks like this:

对于2007-04-26 17:00:00之前的数据,时区为美国/东部.对于之后的数据,时区为美国/芝加哥.

For data before 2007-04-26 17:00:00, the time zone is US/Eastern. For data after, the time zone is American/Chicago.

运行此命令时:

data.index = data[:'2007-04-26 16:59:59'].index.tz_localize('US/Eastern', ambiguous = 'NaT').tz_convert('Europe/London')

我看到一条错误消息:

NonExistentTimeError: 2006-04-02 02:00:00

这确实是由于夏令时.我在2007年遇到了同样的问题.以后的几年中都没有问题.理想情况下,我想要两个命令-一个将数据集的前半部分从Eastern转换为伦敦,另一个将其后半部分从芝加哥转换为伦敦.

This is indeed because of daylight savings time. I have the same problem for 2007. I don't have the problem for subsequent years. Ideally, I'd like two commands - one that converts the first half of the dataset from Eastern to London, and another that converts the second half from Chicago to London.

由于此操作无效,因此我尝试放弃这些时间(我相信一个小时),例如. 02:00:00到03:00:00那里有夏令时.但是,当我运行

Since that didn't work, I tried dropping these times (an hour I believe), eg. 02:00:00 to 03:00:00 where there was daylight savings time. However, when I run

data.drop(data.ix['2005-04-03 2:00:00':'2005-04-03 3:00:00'], inplace=True)

我知道

ValueError: labels ['open' 'high' 'low' 'close' 'volume'] not contained in axis

有人知道我如何可以简单地转换这些时间吗?任何帮助将不胜感激.

Does anyone know how I can simply convert these times? Any help would be greatly appreciated.

谢谢, 亚历克斯

更新以添加更多信息:

好吧,我使用了以下代码来缩短违规时间:

Ok I've used the following code which has worked to drop the offending times:

更新2:

mask =  ((data.index<datetime.strptime("2006-04-02 02:00:00","%Y-%m-%d %H:%S:%M")) | (data.index>datetime.strptime("2006-04-02 03:00:00","%Y-%m-%d %H:%S:%M"))) & ((data.index<datetime.strptime("2005-04-03 02:00:00","%Y-%m-%d %H:%S:%M")) | (data.index>datetime.strptime("2005-04-03 03:00:00","%Y-%m-%d %H:%S:%M"))) & ((data.index<datetime.strptime("2005-10-30 01:00:00","%Y-%m-%d %H:%S:%M")) | (data.index>datetime.strptime("2005-10-30 02:00:00","%Y-%m-%d %H:%S:%M"))) & ((data.index<datetime.strptime("2006-10-29 01:00:00","%Y-%m-%d %H:%S:%M")) | (data.index>datetime.strptime("2006-10-29 02:00:00","%Y-%m-%d %H:%S:%M")))
data_filtered = data[mask]
data_filtered.ix = data_filtered.tz_localize('US/Eastern', infer_dst=True).tz_convert('Europe/London')

但是现在我得到了这个错误:

But now I get this error:

    data_filtered.ix = data_filtered.tz_localize('US/Eastern', infer_dst=True).tz_convert('Europe/London')
Traceback (most recent call last):

  File "<ipython-input-38-0fc8a9e68588>", line 1, in <module>
    data_filtered.ix = data_filtered.tz_localize('US/Eastern', infer_dst=True).tz_convert('Europe/London')

  File "C:\Anaconda\lib\site-packages\pandas\core\generic.py", line 1955, in __setattr__
    object.__setattr__(self, name, value)

AttributeError: can't set attribute

对此有何想法?我做了一些谷歌搜索,但是找不到任何真正相关的东西.

Any ideas on this? I did some Googling but couldn't find anything really related..

推荐答案

根据文档中的描述,您的drop命令似乎不起作用.为了摆脱令人讨厌的时间,我将在数据帧上创建一个掩码,即:

Your drop command doesn't look like it should work based on the description in the docs. To get rid of the offending times, I would create a mask on the dataframe, ie:

from datetime import datetime
mask =  ((df.index<datetime.strptime("2006-04-02 02:00:00","%Y-%m-%d %H:%S:%M") | (df.index>datetime.strptime("2006-04-02 03:00:00","%Y-%m-%d %H:%S:%M")) # probably add some    more years here as or clauses

df_filtered = df[mask]

也许还有一种方法可以使拖放工作正常进行.检查以下相关问题: 删除日光行从时间索引的熊猫数据框中节省时间

Probably there's a way to make drop work too. Check this related question: Deleting rows of daylight saving time from a time indexed pandas dataframe

这篇关于Python pandas tz_localize抛出NonExistentTimeError,然后无法丢弃错误的时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆