pandas 无法读取使用h5py创建的hdf5文件 [英] Pandas can't read hdf5 file created with h5py

查看:599
本文介绍了 pandas 无法读取使用h5py创建的hdf5文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我尝试读取用h5py创建的HDF5格式文件时,出现pandas错误.我想知道我是否做错了什么?

I get pandas error when I try to read HDF5 format files that I have created with h5py. I wonder if I am just doing something wrong?

import h5py
import numpy as np
import pandas as pd
h5_file = h5py.File('test.h5', 'w')
h5_file.create_dataset('zeros', data=np.zeros(shape=(3, 5)), dtype='f')
h5_file.close()
pd_file = pd.read_hdf('test.h5', 'zeros')

给出一个错误: TypeError:如果对象不存在或未传递值,则无法创建存储器

gives an error: TypeError: cannot create a storer if the object is not existing nor a value are passed

我试图指定密钥设置为'/zeros'(就像我在读取文件时使用h5py一样),但是没有运气.

I tried to specify key set to '/zeros' (as I would do it with h5py when reading the file) with no luck.

如果我使用pandas.HDFStore读取它,则会得到一个空的存储:

If I use pandas.HDFStore to read it in, I get an empty store back:

store = pd.HDFStore('test.h5')
>>> store
<class 'pandas.io.pytables.HDFStore'>
File path: test.h5
Empty

我毫不费力地用h5py读回刚刚创建的文件:

I have no troubles reading just created file back with h5py:

h5_back = h5py.File('test.h5', 'r')
h5_back['/zeros']
<HDF5 dataset "zeros": shape (3, 5), type "<f4">

使用以下版本:

Python 3.4.3 (v3.4.3:9b73f1c3e601, Feb 23 2015, 02:52:03) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin

pd.__version__
'0.16.2'
h5py.__version__
'2.5.0'

非常感谢, 玛莎

推荐答案

我在pandas.io中的pytables模块上做了一些工作,据我了解,熊猫与HDF文件的交互仅限于熊猫的特定结构.了解.要查看这些外观,您可以尝试

I've worked a little on the pytables module in pandas.io and from what I know pandas interaction with HDF files is limited to specific structures that pandas understands. To see what these look like, you can try

import pandas as pd
import numpy as np
pd.Series(np.zeros((3,5),dtype=np.float32).to_hdf('test.h5','test')

如果您在 HDFView 中打开"test.h5",则会看到一个路径/test包含重新创建DataFrame所需的4个项目.

If you open 'test.h5' in HDFView, you will see a path /test with 4 items that are needed to recreate the DataFrame.

因此,我认为读取NumPy数组的唯一选择是直接读取它们,然后将其转换为Pandas对象.

So I think your only option for reading in NumPy arrays is to read them in directly and then convert these to Pandas objects.

这篇关于 pandas 无法读取使用h5py创建的hdf5文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆