尝试保存hdf5行时出错,其中一列是字符串,另一列是浮点数组 [英] Error when trying to save hdf5 row where one column is a string and the other is an array of floats
问题描述
我有两列,一列是字符串,另一列是浮点数的numpy数组
I have two column, one is a string, and the other is a numpy array of floats
a = 'this is string'
b = np.array([-2.355, 1.957, 1.266, -6.913])
我想将它们作为一行单独存储在hdf5文件中.为此,我正在使用熊猫
I would like to store them in a row as separate columns in a hdf5 file. For that I am using pandas
hdf_key = 'hdf_key'
store5 = pd.HDFStore('file.h5')
z = pd.DataFrame(
{
'string': [a],
'array': [b]
})
store5.append(hdf_key, z, index=False)
store5.close()
但是,我收到此错误
TypeError: Cannot serialize the column [array] because
its data contents are [mixed] object dtype
是否可以将其存储到h5?如果是这样,怎么办?如果没有,那么存储此类数据的最佳方法是什么?
Is there a way to store this to h5? If so, how? If not, what's the best way to store this sort of data?
推荐答案
我无法帮助您使用熊猫,但是可以向您展示如何使用pytables. 基本上,您将创建一个表,该表引用numpy recarray或定义混合数据类型的dtype.
I can't help you with pandas, but can show you how do this with pytables. Basically you create a table referencing either a numpy recarray or a dtype that defines the mixed datatypes.
下面是一个超级简单的示例,该示例演示如何创建具有1个字符串和4个浮点数的表.然后,它将数据行添加到表中.
它显示了两种添加数据的方法:
1.元组列表(每行1个元组)-请参见append_list
2. numpy recarray(dtype与表定义匹配)-
请参见for循环中的simple_recarr
Below is a super simple example to show how to create a table with 1 string and 4 floats. Then it adds rows of data to the table.
It shows 2 different methods to add data:
1. A list of tuples (1 tuple for each row) - see append_list
2. A numpy recarray (with dtype matching the table definition) -
see simple_recarr
in the for loop
要获取create_table()
的其余参数,请阅读Pytables文档.这非常有帮助,应该回答其他问题.链接如下:
Pytables用户指南
To get the rest of the arguments for create_table()
, read the Pytables documentation. It's very helpful, and should answer additional questions. Link below:
Pytables Users's Guide
import tables as tb
import numpy as np
with tb.open_file('SO_55943319.h5', 'w') as h5f:
my_dtype = np.dtype([('A','S16'),('b',float),('c',float),('d',float),('e',float)])
dset = h5f.create_table(h5f.root, 'table_data', description=my_dtype)
# Append one row using a list:
append_list = [('test string', -2.355, 1.957, 1.266, -6.913)]
dset.append(append_list)
simple_recarr = np.recarray((1,),dtype=my_dtype)
for i in range(5):
simple_recarr['A']='string_' + str(i)
simple_recarr['b']=2.0*i
simple_recarr['c']=3.0*i
simple_recarr['d']=4.0*i
simple_recarr['e']=5.0*i
dset.append(simple_recarr)
print ('done')
这篇关于尝试保存hdf5行时出错,其中一列是字符串,另一列是浮点数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!