H5PY写得很慢 [英] H5PY Writes Very Slow

查看:237
本文介绍了H5PY写得很慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个h5py数据集,如下所示.我想按字符串而不是按数值索引记录.因此,例如我可以通过dset[dset.attrs['id1']]获取第一条记录的值.

I have a h5py dataset like below. I want to index the records by string instead of by numeric value. So, e.g. I would be able to get the value of the first record by dset[dset.attrs['id1']].

我正在尝试使用下面的代码编写属性,但这非常慢.如果我在循环中执行%timeit dset.attrs[rid] = idx,则一次写入大约为310ms.我写的字符串是36个字符.我有大约10万条记录需要写,大约需要9个小时.一定有什么大不对吗?另外,CPU固定.

I am trying to write the attributes with the code below, but it is extremely slow. If I do a %timeit dset.attrs[rid] = idx in the loop a single write is about 310ms. The strings I am writing are 36 characters. I have about 100k records I need to write, which would take about 9 hours. Something must be terribly wrong? Also the CPU is pegged.

ids = ['id1', 'id2', 'id3']
h5 = h5py.File("/tmp/ds.h5", "w")
dset = h5.create_dataset("lds", (100000, ), dtype='float32')

for idx, id in enumerate(ids): # loop takes forever
    dset.attrs[id] = idx # takes about ~310ms

编辑

最小的工作"示例.

for idx, rid in enumerate(range(10)):
    %timeit dset.attrs[str(rid)] = idx

10 loops, best of 3: 470 ms per loop
10 loops, best of 3: 470 ms per loop
...

一次写入将近0.5秒.

Nearly 0.5 second for a single write.

推荐答案

对参数libver使用latest值.这是一个 lot 更快.因此,例如

Use the latest value for parameter libver. This is a lot faster. So, e.g.

h5py.File('ds.h5', 'w', libver='latest')

请参阅此处: https://github.com/h5py/h5py/issues/705

这篇关于H5PY写得很慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆