将具有多个时区的 Pandas 列转换为单个时区 [英] Convert pandas column with multiple timezones to single timezone
问题描述
我在 Pandas DataFrame 中有一个列,其中包含带时区的时间戳.本专栏中有两个不同的时区,我需要确保只有一个.这是列末尾的输出:
I've got a column in a pandas DataFrame that contains timestamps with timezones. There are two different timezones present in this column, and I need to ensure that there's only one. Here's the output of the end of the column:
260003 2019-05-21 12:00:00-06:00
260004 2019-05-21 12:15:00-06:00
Name: timestamp, Length: 260005, dtype: object
就其价值而言,时间戳在 -06:00
和 -07:00
之间有所不同,并具有以下输出:
For what it's worth, the timestamps vary between -06:00
and -07:00
, and have the following output:
datetime.datetime(2007, 10, 1, 1, 0, tzinfo=tzoffset(None, -21600))
for -06:00
datetime.datetime(2007, 11, 17, 5, 15, tzinfo=tzoffset(None, -25200))
for -07:00
我一直在尝试使用 tz.localize 和 tz.convert,它们过去运行良好,但我认为数据只有一个时区.例如,如果我这样做:
I've been trying to use tz.localize and tz.convert, which have worked fine in the past, but I suppose that data has only ever had one timezone. E.g., if I do:
df['timestamp'].dt.tz_localize('MST', ambiguous='infer').dt.tz_convert('MST')
我明白了:
ValueError: Array must be all same time zone
During handling of the above exception, another exception occurred:
ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True
问题
有没有办法将这些转换为 MST?或者任何时区,真的吗?我想我可以按时区分解 DataFrame(不是 100% 确定如何,但我认为这是可能的)并对其进行处理,但我想我想看看是否有更智能的解决方案.谢谢!
Question
Is there a way to convert these to MST? Or any timezone, really? I guess I could break up the DataFrame by timezone (not 100% sure how, but I imagine it's possible) and act on chunks of it, but I figured I'd ask to see if there's a smarter solution out there. Thank you!
推荐答案
我试过了:
df = pd.DataFrame({'timestamp':['2019-05-21 12:00:00-06:00',
'2019-05-21 12:15:00-07:00']})
df['timestamp'] = pd.to_datetime(df.timestamp)
df.timestamp.dt.tz_localize('MST')
工作正常并给出:
0 2019-05-21 18:00:00-07:00
1 2019-05-21 19:15:00-07:00
Name: timestamp, dtype: datetime64[ns, MST]
这不是您所期望的吗?
感谢@G.Anderson 的评论,我尝试了具有时区感知时间戳的不同数据:
Thanks to @G.Anderson's comment, I tried the different data with timezone-aware timestamps:
df = pd.DataFrame({'timestamp':[pd.to_datetime('2019-05-21 12:00:00').tz_localize('MST'),
pd.to_datetime('2019-05-21 12:15:00').tz_localize('EST')]})
然后
df['timestamp'] = pd.to_datetime(df.timestamp)
确实给出了同样的错误.然后我添加了 utc=True
:
did give the same error. Then I added utc=True
:
df.timestamp = pd.to_datetime(df.timestamp, utc=True)
# df.timestamp
# 0 2019-05-21 19:00:00+00:00
# 1 2019-05-21 17:15:00+00:00
# Name: timestamp, dtype: datetime64[ns, UTC]
df.timestamp.dt.tz_convert('MST')
工作正常并给出:
0 2019-05-21 12:00:00-07:00
1 2019-05-21 10:15:00-07:00
Name: timestamp, dtype: datetime64[ns, MST]
这篇关于将具有多个时区的 Pandas 列转换为单个时区的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!