在Python中将分层 pandas DatetimeIndex保存到hdf5时丢失时区意识 [英] Losing timezone-awareness when saving hyerarchical pandas DatetimeIndex to hdf5 in Python

查看:172
本文介绍了在Python中将分层 pandas DatetimeIndex保存到hdf5时丢失时区意识的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在熊猫0.14.1上.假设我需要使用时区在分层索引中按两个时间戳对数据进行索引.将结果DataFrame保存到hdf5时,我似乎失去了对时区的了解:

I'm on pandas 0.14.1. Assume I need to index data by two timestamps in a hierarchical index using timezones. When saving the resulted DataFrame to hdf5 I seem to lose timezone-awareness:

import pandas as pd
dti1 = pd.DatetimeIndex(start=pd.Timestamp('20000101'), end=pd.Timestamp('20000102'), freq='D', tz='EST5EDT')
dti2 = pd.DatetimeIndex(start=pd.Timestamp('20000102'), end=pd.Timestamp('20000103'), freq='D', tz='EST5EDT')
mux = pd.MultiIndex.from_arrays([dti1, dti2])
df = pd.DataFrame(0, index=mux, columns=['a'])

此处df具有时区:

                                                     a
2000-01-01 00:00:00-05:00 2000-01-02 00:00:00-05:00  0
2000-01-02 00:00:00-05:00 2000-01-03 00:00:00-05:00  0

保存并加载到hdf5后,时区信息似乎消失了:

After saving and loading to hdf5, timezone information seems to disappear:

df.to_hdf('/tmp/my.h5', 'data')
pd.read_hdf('/tmp/my.h5', 'data')

导致:

                                         a
2000-01-01 05:00:00 2000-01-02 05:00:00  0
2000-01-02 05:00:00 2000-01-03 05:00:00  0

我想知道是否有一个好的解决方法,这是否是一个已知的错误.

I wonder if there is a good workaround and whether this is a know bug.

推荐答案

使用多索引时,fixed格式不支持此功能.我猜应该应该提高未实现的水平.这是跟踪

This is not supported under fixed format when using a multi-index. I guess should probably raise not implemented I supposed. Here's an issue to track this

此处中查看完整的hdf5接口文档

See full-hdf5-interface docs here

In [11]: pd.read_hdf('/tmp/my.h5', 'data').index.levels[0]
Out[11]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2000-01-01 05:00:00, 2000-01-02 05:00:00]
Length: 2, Freq: None, Timezone: None

但是,如果指定table格式,则可以使用.

But if you specify table format it works.

In [13]: df.to_hdf('/tmp/my.h5', 'data2', format='table')

In [14]: pd.read_hdf('/tmp/my.h5', 'data2')
Out[14]: 
                                                     a
2000-01-01 00:00:00-05:00 2000-01-02 00:00:00-05:00  0
2000-01-02 00:00:00-05:00 2000-01-03 00:00:00-05:00  0

In [15]: pd.read_hdf('/tmp/my.h5', 'data2').index.levels[0]
Out[15]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2000-01-01 00:00:00-05:00, 2000-01-02 00:00:00-05:00]
Length: 2, Freq: None, Timezone: EST5EDT

In [16]: pd.read_hdf('/tmp/my.h5', 'data2').index.levels[1]
Out[16]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2000-01-02 00:00:00-05:00, 2000-01-03 00:00:00-05:00]
Length: 2, Freq: None, Timezone: EST5EDT

这篇关于在Python中将分层 pandas DatetimeIndex保存到hdf5时丢失时区意识的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆