H5PY-如何存储许多不同尺寸的2D数组 [英] H5PY - How to store many 2D arrays of different dimensions

查看:197
本文介绍了H5PY-如何存储许多不同尺寸的2D数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用Python将收集的数据(来自计算机模拟)组织到hdf5文件中. 我测量了多个时间步长内某个空间区域内所有原子的位置和速度[x,y,z,vx,vy,vz].当然,原子数随时间步长而变化.

I would like to organize my collected data (from computer simulations) into a hdf5 file using Python. I measured positions and velocities [x,y,z,vx,vy,vz] of all atoms within a certain space region over many time steps. The number of atoms, of course, varies from time step to time step.

一个最小的示例如下:

[
[ [x1,y1,z1,vx1,vy1,vz1], [x2,y2,z2,vx2,vy2,vz2] ],
[ [x1,y1,z1,vx1,vy1,vz1], [x2,y2,z2,vx2,vy2,vz2], [x3,y3,z3,vx3,vy3,vz3] ] 
]

(2个时间步长, 第一步:2个原子, 第二步:3个原子)

(2 time steps, first time step: 2 atoms, second time step: 3 atoms)

我的想法是在Python中创建一个存储所有信息的hdf5数据集.在每个时间步上,它应该存储所有原子的所有位置/速度的二维数组,即

My idea was to create a hdf5 dataset within Python which stores all the information. At each time step it should store a 2d array of alls positions/velocities of all atoms, i.e.

dataset[0] = [ [x1,y1,z1,vx1,vy1,vz1], [x2,y2,z2,vx2,vy2,vz2] ]
dataset[1] = [ [x1,y1,z1,vx1,vy1,vz1], [x2,y2,z2,vx2,vy2,vz2], [x3,y3,z3,vx3,vy3,vz3] ].

我认为这个想法很明确.但是,我很难定义数组长度不同的数据集的正确数据类型.

The idea is clear, I think. However, I struggle with the definition of the correct data type of the data set with varying array length.

我的代码如下:

import numpy as np
import h5py

file = h5py.File ('file.h5','w')

columnNo = 6    
rowtype = np.dtype("%sfloat32" % columnNo)
dt = h5py.special_dtype( vlen=np.dtype(rowtype) )

dataset = file.create_dataset("dset", (2,), dtype=dt)

print dataset.value

testarray = np.array([[1.,2.,3.,2.,3.,4.],[1.,2.,3.,2.,3.,4.]])
print testarray

dataset[0] = testarray
print dataset[0]

但是,这不起作用.当我运行脚本时,出现错误消息"AttributeError:'float'对象没有属性'dtype'." 看来我定义的dtype是错误的.

This, however, does not work. When I run the script I get the error message "AttributeError: 'float' object has no attribute 'dtype'." It seems that my defined dtype is wrong.

有人看到应该如何正确定义它吗?

Does anybody see how it should be defined correctly?

非常感谢, 斯文

推荐答案

感谢您的快速解答.很有帮助.

Thanks for the quick answer. It helped a lot.

如果我现在只需将数据集的数据类型更改为

If I now simply change the data type of the data set to

dtype = dt,

我得到了我想要的东西.

I get what I would like to have.

以下是Python代码(出于完整性考虑):

Here, the Python code (for completeness):

import numpy as np
import h5py

file = h5py.File ('file.h5','w')

columnNo = 6

rowtype = np.dtype([('f0', '<f4',(6,))])
dt = h5py.special_dtype( vlen=np.dtype(rowtype) )

print('rowtype',rowtype)
print('dt',dt)
dataset = file.create_dataset("dset", (2,), dtype=dt)

# print('value')
# print(dataset.value[0])

arr = np.ones((3,),dtype=rowtype)
# print(repr(arr))
dataset[0] = arr
# print(dataset.value)

testarray = np.array([([1.,2.,3.,2.,3.,4.],),([2.,3.,4.,1.,2.,3.],)], dtype=rowtype)
# print(repr(testarray))

dataset[1] = testarray
print(dataset.value)
for i in range(2): print dataset[i]

并读取相应的输出

('rowtype', dtype([('f0', '<f4', (6,))]))
('dt', dtype('O'))
[ array([([1.0, 1.0, 1.0, 1.0, 1.0, 1.0],),
       ([1.0, 1.0, 1.0, 1.0, 1.0, 1.0],), ([1.0, 1.0, 1.0, 1.0, 1.0, 1.0],)], 
      dtype=[('f0', '<f4', (6,))])
 array([([1.0, 2.0, 3.0, 2.0, 3.0, 4.0],), ([2.0, 3.0, 4.0, 1.0, 2.0, 3.0],)], 
      dtype=[('f0', '<f4', (6,))])]
[([1.0, 1.0, 1.0, 1.0, 1.0, 1.0],) ([1.0, 1.0, 1.0, 1.0, 1.0, 1.0],)
 ([1.0, 1.0, 1.0, 1.0, 1.0, 1.0],)]
[([1.0, 2.0, 3.0, 2.0, 3.0, 4.0],) ([2.0, 3.0, 4.0, 1.0, 2.0, 3.0],)]

只是弄对了:我原始代码中的问题是我的行类型数据结构的定义不正确,对吧?

Just to get it right: The problem in my original code was a bad definition of my rowtype data structure, right?

最好, 斯文

这篇关于H5PY-如何存储许多不同尺寸的2D数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆