pandas 使用单独的时区列转换日期时间 [英] Pandas convert datetime with a separate time zone column

查看:80
本文介绍了 pandas 使用单独的时区列转换日期时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,其中一列用于时区,一列用于日期时间.我想先将这些转换为UTC并与其他数据结合,然后我将进行一些计算,最终将其从UTC转换为观看者的本地时区.

I have a dataframe with a column for the time zone and a column for the datetime. I would like to convert these to UTC first to join with other data, and then I'll have some calculations to convert from UTC to the viewers local time zone eventually.

datetime              time_zone
2016-09-19 01:29:13   America/Bogota 
2016-09-19 02:16:04   America/New_York
2016-09-19 01:57:54   Africa/Cairo

def create_utc(df, column, time_format='%Y-%m-%d %H:%M:%S'):
    timezone = df['TZ']
    df[column + '_utc'] = df[column].dt.tz_localize(timezone).dt.tz_convert('UTC').dt.strftime(time_format)
    df[column + '_utc'].replace('NaT', np.nan, inplace=True)
    df[column + '_utc'] = pd.to_datetime(df[column + '_utc'])
    return df

那是我的错误尝试.错误是事实不明确,这是有道理的,因为时区"变量引用的是列.如何引用同一行中的值?

That was my flawed attempt. The error is that the truth is ambiguous which makes sense because the 'timezone' variable is referring to a column. How do I refer to the value in the same row?

以下是一天的数据(394,000行和22个唯一时区)的以下答案的一些结果. Edit2:我添加了一个groupby示例,以防有人想要查看结果.到目前为止,它是最快的.

here are some results from the answers below on one day of data (394,000 rows and 22 unique time zones). I added a groupby example in case someone wants to see the results. It is the fastest, by far.

%%timeit

for tz in df['TZ'].unique():
    df.ix[df['TZ'] == tz, 'datetime_utc2'] = df.ix[df['TZ'] == tz, 'datetime'].dt.tz_localize(tz).dt.tz_convert('UTC')
df['datetime_utc2'] = df['datetime_utc2'].dt.tz_localize(None)

1 loops, best of 3: 1.27 s per loop

1 loops, best of 3: 1.27 s per loop

%%timeit

df['datetime_utc'] = [d['datetime'].tz_localize(d['TZ']).tz_convert('UTC') for i, d in df.iterrows()]
df['datetime_utc'] = df['datetime_utc'].dt.tz_localize(None)

1 loops, best of 3: 50.3 s per loop

1 loops, best of 3: 50.3 s per loop

df['datetime_utc'] = pd.concat([d['datetime'].dt.tz_localize(tz).dt.tz_convert('UTC') for tz, d in df.groupby('TZ')])



**1 loops, best of 3: 249 ms per loop**

推荐答案

这是向量化方法(它将循环df.time_zone.nunique()次):

Here is a vectorized approach (it will loop df.time_zone.nunique() times):

In [2]: t
Out[2]:
             datetime         time_zone
0 2016-09-19 01:29:13    America/Bogota
1 2016-09-19 02:16:04  America/New_York
2 2016-09-19 01:57:54      Africa/Cairo
3 2016-09-19 11:00:00    America/Bogota
4 2016-09-19 12:00:00  America/New_York
5 2016-09-19 13:00:00      Africa/Cairo

In [3]: for tz in t.time_zone.unique():
   ...:         mask = (t.time_zone == tz)
   ...:         t.loc[mask, 'datetime'] = \
   ...:             t.loc[mask, 'datetime'].dt.tz_localize(tz).dt.tz_convert('UTC')
   ...:

In [4]: t
Out[4]:
             datetime         time_zone
0 2016-09-19 06:29:13    America/Bogota
1 2016-09-19 06:16:04  America/New_York
2 2016-09-18 23:57:54      Africa/Cairo
3 2016-09-19 16:00:00    America/Bogota
4 2016-09-19 16:00:00  America/New_York
5 2016-09-19 11:00:00      Africa/Cairo

更新:

In [12]: df['new'] = df.groupby('time_zone')['datetime'] \
                       .transform(lambda x: x.dt.tz_localize(x.name))

In [13]: df
Out[13]:
             datetime         time_zone                 new
0 2016-09-19 01:29:13    America/Bogota 2016-09-19 06:29:13
1 2016-09-19 02:16:04  America/New_York 2016-09-19 06:16:04
2 2016-09-19 01:57:54      Africa/Cairo 2016-09-18 23:57:54
3 2016-09-19 11:00:00    America/Bogota 2016-09-19 16:00:00
4 2016-09-19 12:00:00  America/New_York 2016-09-19 16:00:00
5 2016-09-19 13:00:00      Africa/Cairo 2016-09-19 11:00:00

这篇关于 pandas 使用单独的时区列转换日期时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆