pandas to_hdf成功,但随后read_hdf失败 [英] Pandas to_hdf succeeds but then read_hdf fails

查看:203
本文介绍了 pandas to_hdf成功,但随后read_hdf失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

熊猫to_hdf成功,但是当我使用自定义对象作为列标题时,read_hdf失败(我使用自定义对象,因为我需要在其中存储其他信息).

Pandas to_hdf succeeds but then read_hdf fails when I use custom objects as column headers (I use custom objects because I need to store other info in them).

有什么办法可以使这项工作成功吗?还是仅仅是Pandas错误或PyTables错误?

Is there some way to make this work? Or is this just a Pandas bug or PyTables bug?

作为一个例子,下面,我将首先显示一个使用字符串列标题的DataFrame foo,并且所有内容在to_hdf/read_hdf上都可以正常工作,然后将foo更改为使用自定义的Col类对于列标题,to_hdf仍然可以正常工作,但随后read_hdf会引发断言错误:

As an example, below, I will show first making a DataFrame foo that uses string column headers, and everything works fine with to_hdf/read_hdf, but then changing foo to use a custom Col class for column headers, to_hdf still works fine but then read_hdf raises assertion error:

In [48]: foo = pd.DataFrame(np.random.randn(2, 3), columns = ['aaa', 'bbb', 'ccc'])

In [49]: foo
Out[49]: 
    aaa       bbb       ccc
0 -0.434303  0.174689  1.373971
1 -0.562228  0.862092 -1.361979

In [50]: foo.to_hdf('foo.h5', 'foo')

In [51]: bar = pd.read_hdf('foo.h5', 'foo')

In [52]: bar
Out[52]: 
    aaa       bbb       ccc
0 -0.434303  0.174689  1.373971
1 -0.562228  0.862092 -1.361979

In [52]: 

In [53]: class Col(object):
...:     def __init__(self, name, other_info):
...:         self.name = name
...:         self.other_info = other_info
...:     def __str__(self):
...:         return self.name
...:     

In [54]: foo = pd.DataFrame(np.random.randn(2, 3), columns = [Col('aaa', {'z': 5}), Col('bbb', {'y': True}), Col('ccc', {})])

In [55]: foo
Out[55]: 
    aaa       bbb       ccc
0 -0.830503  1.066178  1.057349
1  0.406967 -0.131430  1.970204

In [56]: foo.to_hdf('foo.h5', 'foo')

In [57]: bar = pd.read_hdf('foo.h5', 'foo')
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-57-888b061a1d2c> in <module>()
----> 1 bar = pd.read_hdf('foo.h5', 'foo')

/.../python3.4/site-packages/pandas/io/pytables.py in read_hdf(path_or_buf, key, **kwargs)
330 
331     try:
--> 332         return store.select(key, auto_close=auto_close, **kwargs)
333     except:
334         # if there is an error, close the store

/.../python3.4/site-packages/pandas/io/pytables.py in select(self, key, where, start, stop, columns, iterator, chunksize, auto_close, **kwargs)
672                            auto_close=auto_close)
673 
--> 674         return it.get_result()
675 
676     def select_as_coordinates(

/.../python3.4/site-packages/pandas/io/pytables.py in get_result(self, coordinates)
   1366 
   1367         # directly return the result
-> 1368         results = self.func(self.start, self.stop, where)
   1369         self.close()
   1370         return results

/.../python3.4/site-packages/pandas/io/pytables.py in func(_start, _stop, _where)
665             return s.read(start=_start, stop=_stop,
666                           where=_where,
--> 667                           columns=columns, **kwargs)
668 
669         # create the iterator

/.../python3.4/site-packages/pandas/io/pytables.py in read(self, **kwargs)
   2792             blocks.append(blk)
   2793 
-> 2794         return self.obj_type(BlockManager(blocks, axes))
   2795 
   2796     def write(self, obj, **kwargs):

/.../python3.4/site-packages/pandas/core/internals.py in __init__(self, blocks, axes, do_integrity_check, fastpath)
   2180         self._consolidate_check()
   2181 
-> 2182         self._rebuild_blknos_and_blklocs()
   2183 
   2184     def make_empty(self, axes=None):

/.../python3.4/site-packages/pandas/core/internals.py in _rebuild_blknos_and_blklocs(self)
   2271 
   2272         if (new_blknos == -1).any():
-> 2273             raise AssertionError("Gaps in blk ref_locs")
   2274 
   2275         self._blknos = new_blknos

AssertionError: Gaps in blk ref_locs

更新:

因此Jeff回答(a)不支持此功能"和(b)如果有元数据,则将其写入属性".

So Jeff answered (a) "this is not supported" and (b) "if you have meta-data then write it to the attributes".

关于(a)的问题1: 我的列标题对象具有返回其属性等的方法,例如,代替我必须解析出值的列标题字符串'x5y3z8',我可以简单地执行col_header.x(给出5)col_header.y(给出3)等.这是非常面向对象和pythonic的,而不是使用字符串来存储信息,并且每次都必须解析它来检索信息.您如何建议我以一种不错的方式(也支持)替换当前的列标题对象?

Question 1 regarding (a): My column header objects have methods to return their properties, etc. For example, instead of a column header string 'x5y3z8' where I would have to parse out the values, I can simply do col_header.x (gives 5) col_header.y (gives 3) etc. This is very object-oriented and pythonic, instead of using a string to store info and having to parse it every time to retrieve info. How do you suggest I replace my current column header objects in a nice way (that's also supported)?

(顺便说一句,您可能会看'x5y3z8'并认为层次结构索引有效,但事实并非如此,因为并非每个列标题都是'x#y#z#'.我可能有一个字符串列'foo' ,另一个ints的"bar5baz7"和另一个float的"x5y3z8".列标题不是统一的.)

(BTW, you might look at 'x5y3z8' and think hierarchical index works, but that is not the case because not every column header is 'x#y#z#'. I might have one column 'foo' of strings, another one 'bar5baz7' of ints, and another 'x5y3z8' of floats. The column headers aren't uniform.)

关于(a)的问题2: 当您说它不被支持时,您是在专门谈论to_hdf/read_hdf不支持它,还是您实际上是在说熊猫一般不支持它?如果仅缺少HDF5支持,那么我可以切换到其他将DataFrame保存到磁盘并使它工作的方法,对吗?您预见到将来会出现任何问题吗?例如,这是否会与to_pickle/read_pickle一起打破? (我失去了表现,但是不得不放弃一些东西,对吧?)

Question 2 regarding (a): When you say it's not supported, are you specifically talking about to_hdf/read_hdf not supporting it, or are you actually saying that Pandas in general doesn't support it? If it's only the HDF5 support that's missing, then I could switch to some other way of saving the DataFrames to disk and have it work, right? Do you foresee any problems with that in the future? Will this ever break with to_pickle/read_pickle, for example? (I lose performance, but got to give up something, right?)

关于(b)的问题3: 您的意思是如果有元数据,则将其写入属性".属性是什么?一个简单的例子将对我有很大帮助.我是Pandas的新手.谢谢!

Question 3 regarding (b): What do you mean by "if you have meta-data then write it to the attributes". Attributes of what? A simple example would help me a lot. I'm pretty new to Pandas. Thanks!

推荐答案

该功能不受支持.

这将在下一个版本的熊猫(写作中)中出现,用于format='table'.对于fixed也应该如此,但是未实现.根本不支持,也不太可能支持.您应该只使用字符串.如果您有元数据,则将其写入属性.

This will raise in the next version of pandas (on the writing), for format='table'. Should for fixed as well, but that's not implemented. This is simply not supported, nor likely to be. You should just use strings. If you have meta-data then write it to the attributes.

这篇关于 pandas to_hdf成功,但随后read_hdf失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆