通过h5py(HDF5)写入具有可变长度字符串的复合数据集 [英] Writing to compound dataset with variable length string via h5py (HDF5)

查看:172
本文介绍了通过h5py(HDF5)写入具有可变长度字符串的复合数据集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经能够使用h5py在我的HDF5文件中创建一个由无符号int和长度可变的字符串组成的复合数据集,但是我无法对其进行写操作.

I've been able to create a compound dataset consisting of an unsigned int and a variable-length string in my HDF5 file using h5py, but I can't write to it.

dt = h5py.special_dtype(vlen=str)
dset = fout.create_dataset(ver, (1,), dtype=np.dtype([("time", np.uint64),("value", dt)]))

通过将化合物数据集的特定列设置为等于现有的numpy数组,我已经相当轻松地写入了其他化合物数据集.

I've written to other compound datasets fairly easily, by setting the specific column(s) of the compound dataset as equal to an existing numpy array.

现在,我遇到麻烦的是使用可变长度的字符串写入复合数据集. Numpy不支持可变长度的字符串,因此我无法在手工创建包含该值的numpy数组之前.

Now where I run into trouble is with writing to the compound dataset with a variable length string. Numpy does not support a variable length string, so I can't create the numpy array before hand that would contain the value.

我的下一个想法是将单个值写入所讨论的列,这对unsigned int有效.但是,当我尝试将字符串写入复合数据集中的可变长度字符串字段时,我得到:

My next thought was to write the individual value to the column in question, and this works for the unsigned int. When I try to write a string to the variable-lenght string field in the compound dataset though, I get:

    dset["value"] = str("blah")
  File "D:\Anaconda3\lib\site-packages\h5py\_hl\dataset.py", line 508, in __setitem__
    val = val.astype(numpy.dtype([(names[0], dtype)]))
ValueError: Setting void-array with object members using buffer.

任何指导将不胜感激.

推荐答案

按照我先前对我运行了此测试(h5py版本'2.2.1'):

I ran this test (h5py version '2.2.1'):

In [4]: import h5py
In [5]: dt = h5py.special_dtype(vlen=str)
In [6]: f=h5py.File('foo.hdf5')
In [8]: ds1 = f.create_dataset('JustStrings',(10,), dtype=dt)
In [10]: ds1[0]='string'
In [11]: ds1[1]='a longer string'
In [13]: ds1[2:5]='one_string two_strings three'.split()

In [14]: ds1
Out[14]: <HDF5 dataset "JustStrings": shape (10,), type "|O4">

In [15]: ds1.value
Out[15]: 
array(['string', 'a longer string', 'one_string', 'two_strings', 'three',
       '', '', '', '', ''], dtype=object)

对于像您这样的混合dtype:

And for a mixed dtype like yours:

In [16]: ds2 = f.create_dataset('IntandStrings',(10,),
   dtype=np.dtype([("number",int),('astring',dt)]))
In [17]: ds2[0]=(1,'astring')
In [18]: ds2[1]=(10,'a longer string')
In [19]: ds2[2:4]=[(10,'a longer much string'),(0,'')]
In [20]: ds2.value
Out[20]: 
array([(1, 'astring'), (10, 'a longer string'),
       (10, 'a longer much string'), (0, ''), (0, ''), (0, ''), (0, ''),
       (0, ''), (0, ''), (0, '')], 
      dtype=[('number', '<i4'), ('astring', 'O')])

尝试自行设置字段似乎无效

Trying to set a field by itself does not seem to work

ds2['astring'][4]='one two three four'

相反,我必须设置整个记录:

Instead I have to set the whole record:

ds2[4]=(123,'one two three four')

尝试设置整个字段会产生相同的错误:

Trying to set the whole field produces the same error:

ds2['astring']='astring'

我将此数据集初始化为(10,),而您的数据集是(1,).但我认为这是同样的问题.

I initialed this dataset to (10,), while yours is (1,). But I think it's the same problem.

不过,我可以设置整个数字字段:

I can, though, set the whole numeric field:

In [48]: ds2['number']=np.arange(10)
In [50]: ds2['number']
Out[50]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [51]: ds2.value
Out[51]: 
array([(0, 'astring'), (1, 'a longer string'), 
       (2, 'a longer much string'),
       (3, ''), (4, 'one two three four'), (5, ''), 
       (6, ''), (7, ''),
       (8, ''), (9, '')], 
      dtype=[('number', '<i4'), ('astring', 'O')])

这篇关于通过h5py(HDF5)写入具有可变长度字符串的复合数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆