Python(Pandas):将数据框存储在具有多索引的hdf5中 [英] Python (pandas): store a data frame in hdf5 with a multi index

查看:320
本文介绍了Python(Pandas):将数据框存储在具有多索引的hdf5中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要使用具有多个索引的大尺寸数据框,因此我尝试创建一个数据框以了解如何将其存储在hdf5文件中. 数据框是这样的:(在前2列中有multi索引)

I need to work with large dimension data frame with multi index, so i tried to create a data frame to learn how to store it in an hdf5 file. The data frame is like this: (with the multi index in the first 2 columns)

Symbol    Date          0

C         2014-07-21    4792
B         2014-07-21    4492
A         2014-07-21    5681
B         2014-07-21    8310
A         2014-07-21    1197
C         2014-07-21    4722
          2014-07-21    7695
          2014-07-21    1774

我正在使用pandas.to_hdf,但是当我尝试选择组中的数据时会创建一个固定格式存储":

I'm using the pandas.to_hdf but it creates a "Fixed format store", when I try to select the datas in a group:

store.select('table','Symbol == "A"')

它返回一些错误,主要问题是这个

it returns some errors and the main problem is this

TypeError: cannot pass a where specification when reading from a Fixed format store. this store must be selected in its entirety

然后我尝试像这样附加DataFrame:

Then i tried to append the DataFrame like this:

store.append('ts1',timedata)

那应该创建一个表,但这给了我另一个错误:

and that should create a table, but that gives me another error:

TypeError: [unicode] is not implemented as a table column

因此,我需要代码以hdf5格式将数据帧存储在表中并从单个索引中选择数据(为此,我找到了此代码:store.select('timedata','Symbol == "A"'))

So i need the code to store the data frame in a table in hdf5 format and to select the datas from a single index (for that purpose i found this code: store.select('timedata','Symbol == "A"') )

推荐答案

下面是一个示例

In [8]: pd.__version__
Out[8]: '0.14.1'

In [9]: np.__version__
Out[9]: '1.8.1'

In [10]: import sys

In [11]: sys.version
Out[11]: '2.7.3 (default, Jan  7 2013, 09:17:50) \n[GCC 4.4.5]'

In [4]: df = DataFrame(np.arange(9).reshape(9,-1),index=pd.MultiIndex.from_product([list('abc'),date_range('20140721',periods=3)],names=['symbol','date']),columns=['value'])

In [5]: df
Out[5]: 
                   value
symbol date             
a      2014-07-21      0
       2014-07-22      1
       2014-07-23      2
b      2014-07-21      3
       2014-07-22      4
       2014-07-23      5
c      2014-07-21      6
       2014-07-22      7
       2014-07-23      8

In [6]: df.to_hdf('test.h5','df',mode='w',format='table')

In [7]: pd.read_hdf('test.h5','df',where='date=20140722')
Out[7]: 
                   value
symbol date             
a      2014-07-22      1
b      2014-07-22      4
c      2014-07-22      7

In [12]: pd.read_hdf('test.h5','df',where='symbol="a"')
Out[12]: 
                   value
symbol date             
a      2014-07-21      0
       2014-07-22      1
       2014-07-23      2

这篇关于Python(Pandas):将数据框存储在具有多索引的hdf5中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆