Python pandas tz_localize抛出NonExistentTimeError,然后无法丢弃错误的时间 [英] Python pandas tz_localize throws NonExistentTimeError, then unable to drop erroneous times
问题描述
在python熊猫中,我有一个像这样的数据集:
In python pandas, I have a dataset that looks like this:
对于2007-04-26 17:00:00之前的数据,时区为美国/东部.对于之后的数据,时区为美国/芝加哥.
For data before 2007-04-26 17:00:00, the time zone is US/Eastern. For data after, the time zone is American/Chicago.
运行此命令时:
data.index = data[:'2007-04-26 16:59:59'].index.tz_localize('US/Eastern', ambiguous = 'NaT').tz_convert('Europe/London')
我看到一条错误消息:
NonExistentTimeError: 2006-04-02 02:00:00
这确实是由于夏令时.我在2007年遇到了同样的问题.以后的几年中都没有问题.理想情况下,我想要两个命令-一个将数据集的前半部分从Eastern转换为伦敦,另一个将其后半部分从芝加哥转换为伦敦.
This is indeed because of daylight savings time. I have the same problem for 2007. I don't have the problem for subsequent years. Ideally, I'd like two commands - one that converts the first half of the dataset from Eastern to London, and another that converts the second half from Chicago to London.
由于此操作无效,因此我尝试放弃这些时间(我相信一个小时),例如. 02:00:00到03:00:00那里有夏令时.但是,当我运行
Since that didn't work, I tried dropping these times (an hour I believe), eg. 02:00:00 to 03:00:00 where there was daylight savings time. However, when I run
data.drop(data.ix['2005-04-03 2:00:00':'2005-04-03 3:00:00'], inplace=True)
我知道
ValueError: labels ['open' 'high' 'low' 'close' 'volume'] not contained in axis
有人知道我如何可以简单地转换这些时间吗?任何帮助将不胜感激.
Does anyone know how I can simply convert these times? Any help would be greatly appreciated.
谢谢, 亚历克斯
更新以添加更多信息:
好吧,我使用了以下代码来缩短违规时间:
Ok I've used the following code which has worked to drop the offending times:
更新2:
mask = ((data.index<datetime.strptime("2006-04-02 02:00:00","%Y-%m-%d %H:%S:%M")) | (data.index>datetime.strptime("2006-04-02 03:00:00","%Y-%m-%d %H:%S:%M"))) & ((data.index<datetime.strptime("2005-04-03 02:00:00","%Y-%m-%d %H:%S:%M")) | (data.index>datetime.strptime("2005-04-03 03:00:00","%Y-%m-%d %H:%S:%M"))) & ((data.index<datetime.strptime("2005-10-30 01:00:00","%Y-%m-%d %H:%S:%M")) | (data.index>datetime.strptime("2005-10-30 02:00:00","%Y-%m-%d %H:%S:%M"))) & ((data.index<datetime.strptime("2006-10-29 01:00:00","%Y-%m-%d %H:%S:%M")) | (data.index>datetime.strptime("2006-10-29 02:00:00","%Y-%m-%d %H:%S:%M")))
data_filtered = data[mask]
data_filtered.ix = data_filtered.tz_localize('US/Eastern', infer_dst=True).tz_convert('Europe/London')
但是现在我得到了这个错误:
But now I get this error:
data_filtered.ix = data_filtered.tz_localize('US/Eastern', infer_dst=True).tz_convert('Europe/London')
Traceback (most recent call last):
File "<ipython-input-38-0fc8a9e68588>", line 1, in <module>
data_filtered.ix = data_filtered.tz_localize('US/Eastern', infer_dst=True).tz_convert('Europe/London')
File "C:\Anaconda\lib\site-packages\pandas\core\generic.py", line 1955, in __setattr__
object.__setattr__(self, name, value)
AttributeError: can't set attribute
对此有何想法?我做了一些谷歌搜索,但是找不到任何真正相关的东西.
Any ideas on this? I did some Googling but couldn't find anything really related..
推荐答案
根据文档中的描述,您的drop命令似乎不起作用.为了摆脱令人讨厌的时间,我将在数据帧上创建一个掩码,即:
Your drop command doesn't look like it should work based on the description in the docs. To get rid of the offending times, I would create a mask on the dataframe, ie:
from datetime import datetime
mask = ((df.index<datetime.strptime("2006-04-02 02:00:00","%Y-%m-%d %H:%S:%M") | (df.index>datetime.strptime("2006-04-02 03:00:00","%Y-%m-%d %H:%S:%M")) # probably add some more years here as or clauses
df_filtered = df[mask]
也许还有一种方法可以使拖放工作正常进行.检查以下相关问题: 删除日光行从时间索引的熊猫数据框中节省时间
Probably there's a way to make drop work too. Check this related question: Deleting rows of daylight saving time from a time indexed pandas dataframe
这篇关于Python pandas tz_localize抛出NonExistentTimeError,然后无法丢弃错误的时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!