写入过程结束后,HDF5文件内容消失 [英] HDF5 file content disappears after writing process finishes

查看:189
本文介绍了写入过程结束后,HDF5文件内容消失的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 h5py 使用python迭代地写入大型数组。这需要相当长的时间,我可以看到文件大小随着代码运行而增长。

I'm using h5py to iteratively write to a large array with python. It takes quite a long time and I can watch the file size grow as the code is running.

不幸的是,当我的python程序退出时,文件内容消失了。该文件没有损坏,但所有值都是 0.0 (我设置的填充值)。
我确保文件 f f.close()关闭,并在关闭文件后关闭(但在退出程序之前),文件仍然完好无损,内容就在那里。

Unfortunately, wenn my python programm exits, the file content disappears. The file is not corrupt, but all values are 0.0 (the fill value I set). I made sure the file f is closed with f.close(), and after closing the file (but before exiting the program), the file was still intact and content was there.

是否有人熟悉此行为并能解释其中发生的情况?我很感激任何帮助!

Is anyone familiar with this behaviour and can explain what happens there? I'd appreciate any help!

为了给你更多信息,这是我具体做的。我创建了一个 Process ,它处理来自 Queue 的结果。初始化进程时,将创建HDF5文件,当到达队列中的最后一项时,将关闭该文件。所有这些似乎工作正常(如上所述),但我提到它,因为我没有很多流程经验,并想知道流程类中的文件处理是否可能是问题。

To give you a bit more information, here is what I do specifically. I created a Process that processes results from a Queue. When the process is initialised, the HDF5 file is created, and when the last item in the queue is reached, the file is closed. All of this seems to work fine (as described above) but I'm mentioning it as I don't have a lot of experience with processes and wondering if the file handling in the process class could be the problem.

from multiprocessing import Process, Queue
import h5py

class ResultProcessor(Process):

    def __init__(self, result_queue, result_file):
        Process.__init__(self)
        self.result_queue = result_queue
        self.daemon = True

        #open result file handle ('w')
        self.f = h5py.File(result_file, 'w')
        self.dset = self.f.create_dataset('zipped', (num_jobs, num_subjects), compression="gzip", fillvalue=0)

    def run(self):
        while True:
            next_result = self.result_queue.get()

            if next_result is None:
                # Poison pill means we should exit
                self.f.close()
                return

            idx, result = next_result
            self.dset[idx,:] = result

然后初始化该过程并运行如下:

The process is then initialised and run as below:

# results_queue is still empty
result_processor = ResultProcessor(results_queue, file_name)
result_processor.start()

# now the result queue is filled
process_stuff_and_feed_to_result_queue()
# add last queue item so the end can be recognised:
result_queue.put(None)

result_processor.join()

# I checked at this point: The file content is still around!


推荐答案

虽然这不能解决为什么内容文件似乎消失了,你应该记住,HDF5(以及因此h5py)不是为了编写多个程序而设计的(使用多处理通常属于这个)写入同一个文件。在1.10中有MPI支持和SWMR(单个编写器多个读取器),但是您没有完全自由地以任何顺序编写任何内容。

While this won't solve why the contents of the file appears to disappear, you should keep in mind that HDF5 (and hence h5py) is not designed to write to have multiple programs (using multiprocessing usually falls under this) writing to the same file. There is MPI support and SWMR (single writer multiple reader) in 1.10, but you don't have complete freedom to write anything in any order.

这篇关于写入过程结束后,HDF5文件内容消失的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆