pandas 根据列值将UNIX时间转换为多个不同的时区 [英] Pandas convert UNIX time to multiple different timezones depending on column value
问题描述
我有一个带有UNIX时间戳记的pandas数据帧(这些是整数,而不是时间对象).这些观测值发生在多个地理位置,因此出现在多个时区.我想根据观察的地理位置将每个时区的UNIX时间戳转换为本地时间(在新列中)(此信息在数据框的列中).
I have a pandas dataframe with UNIX timestamps (these are integers and not time objects). The observations occur in multiple geographic locations, and therefore multiple timezones. I'd like to convert the UNIX timestamp into local time (in a new column) for each of these timezones, based on the geography of the observation (this information is in a column of the dataframe).
简单的工作示例:
创建数据框:
c1=[1546555701, 1546378818, 1546574677, 1546399159, 1546572278]
c2=['America/Detroit','America/Chicago','America/Los_Angeles','America/Los_Angeles','America/Detroit']
df3=pd.DataFrame(list(zip(c1,c2)),columns=['utc','tz'])
print(df3)
预期输出:
utc tz
0 1546555701 America/Detroit
1 1546378818 America/Chicago
2 1546574677 America/Los_Angeles
3 1546399159 America/Los_Angeles
4 1546572278 America/Detroit
当前尝试:
df3['date_time']=pd.to_datetime(df3['utc'],unit='s')
print(df3)
返回:
utc tz date_time
0 1546555701 America/Detroit 2019-01-03 22:48:21
1 1546378818 America/Chicago 2019-01-01 21:40:18
2 1546574677 America/Los_Angeles 2019-01-04 04:04:37
3 1546399159 America/Los_Angeles 2019-01-02 03:19:19
4 1546572278 America/Detroit 2019-01-04 03:24:38
这将转换为日期时间对象,但我不确定如何控制时区(我想它会给我当地时区的时间).当然,它不是基于"tz"列的.
This converts to a datetime object, but I am unsure how to control the timezone (I presume it gives me the time in my local timezone). It is certainly not based off the 'tz' column.
我看过熊猫的 tz_convert()函数和箭头包,但无法执行弄清楚如何使这些工作.我也欢迎其他解决方案.我不仅关注时区,还确保正确处理夏令时.
I have looked at pandas' tz_convert() function and the arrow package, but have not been able to figure out how to make these work. I am open to other solutions as well. I am concerned not only with timezone, but also making sure that daylight savings time is properly handled.
推荐答案
假定POSIX时间戳(自1970-01-01 UTC以来的秒数),您可以使用关键字utc = True直接转换为UTC.
Assuming POSIX timestamps (seconds since 1970-01-01 UTC), you can directly convert to UTC with keyword utc=True.
import pandas as pd
c1=[1546555701, 1546378818, 1546574677, 1546399159, 1546572278]
c2=['America/Detroit','America/Chicago','America/Los_Angeles','America/Los_Angeles','America/Detroit']
df3=pd.DataFrame(list(zip(c1,c2)),columns=['utc','tz'])
df3['date_time']=pd.to_datetime(df3['utc'], unit='s', utc=True)
# df3['date_time']
# 0 2019-01-03 22:48:21+00:00
# 1 2019-01-01 21:40:18+00:00
# 2 2019-01-04 04:04:37+00:00
# 3 2019-01-02 03:19:19+00:00
# 4 2019-01-04 03:24:38+00:00
# Name: date_time, dtype: datetime64[ns, UTC]
然后您可以使用apply将时区应用于每个值,例如
You can then apply a time zone to each value using apply, e.g.
def setTZ(row):
return row['date_time'].tz_convert(row['tz'])
df3['date_time']=df3.apply(lambda r: setTZ(r), axis=1)
# df3
# utc tz date_time
# 0 1546555701 America/Detroit 2019-01-03 17:48:21-05:00
# 1 1546378818 America/Chicago 2019-01-01 15:40:18-06:00
# 2 1546574677 America/Los_Angeles 2019-01-03 20:04:37-08:00
# 3 1546399159 America/Los_Angeles 2019-01-01 19:19:19-08:00
# 4 1546572278 America/Detroit 2019-01-03 22:24:38-05:00
请注意,在混合时区中,不能将 dt
访问器用于系列.您需要迭代代码,例如
Note that with mixed time zones, you can't use the dt
accessor for the Series. You need iterative code instead, e.g.
df3['date_time'].apply(lambda t: t.hour)
获取每个日期时间的小时数.一种解决方法是创建一个具有本地时间但不支持时区的列:
to get the hour for each datetime. A way around this would be to create a column that has local time but is not time zone aware:
def toLocalTime(row):
return row['date_time'].tz_convert(row['tz']).replace(tzinfo=None)
df3['local_time'] = df3.apply(lambda r: toLocalTime(r), axis=1)
这篇关于 pandas 根据列值将UNIX时间转换为多个不同的时区的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!