H5PY写得很慢 [英] H5PY Writes Very Slow
问题描述
我有一个h5py数据集,如下所示.我想按字符串而不是按数值索引记录.因此,例如我可以通过dset[dset.attrs['id1']]
获取第一条记录的值.
I have a h5py dataset like below. I want to index the records by string instead of by numeric value. So, e.g. I would be able to get the value of the first record by dset[dset.attrs['id1']]
.
我正在尝试使用下面的代码编写属性,但这非常慢.如果我在循环中执行%timeit dset.attrs[rid] = idx
,则一次写入大约为310ms.我写的字符串是36个字符.我有大约10万条记录需要写,大约需要9个小时.一定有什么大不对吗?另外,CPU固定.
I am trying to write the attributes with the code below, but it is extremely slow. If I do a %timeit dset.attrs[rid] = idx
in the loop a single write is about 310ms. The strings I am writing are 36 characters. I have about 100k records I need to write, which would take about 9 hours. Something must be terribly wrong? Also the CPU is pegged.
ids = ['id1', 'id2', 'id3']
h5 = h5py.File("/tmp/ds.h5", "w")
dset = h5.create_dataset("lds", (100000, ), dtype='float32')
for idx, id in enumerate(ids): # loop takes forever
dset.attrs[id] = idx # takes about ~310ms
编辑
最小的工作"示例.
for idx, rid in enumerate(range(10)):
%timeit dset.attrs[str(rid)] = idx
10 loops, best of 3: 470 ms per loop
10 loops, best of 3: 470 ms per loop
...
一次写入将近0.5秒.
Nearly 0.5 second for a single write.
推荐答案
对参数libver
使用latest
值.这是一个 lot 更快.因此,例如
Use the latest
value for parameter libver
. This is a lot faster. So, e.g.
h5py.File('ds.h5', 'w', libver='latest')
请参阅此处: https://github.com/h5py/h5py/issues/705
这篇关于H5PY写得很慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!