我们可以为类似python文件的对象禁用h5py文件锁定吗? [英] Can we disable h5py file locking for python file-like object?

查看:54
本文介绍了我们可以为类似python文件的对象禁用h5py文件锁定吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用 h5py 打开HDF5文件时,您可以传入类似python文件的对象.我这样做了,其中类似文件的对象是我自己的基于网络的传输层的自定义实现.

When opening an HDF5 file with h5py you can pass in a python file-like object. I have done so, where the file-like object is a custom implementation of my own network-based transport layer.

这很好,我可以在高延迟传输层上切片大型HDF5文件.但是,HDF5似乎提供了其自己的文件锁定功能,因此,如果您在同一进程(线程模型)中以只读方式打开多个文件,则该文件仍只会有效地连续运行这些操作.

This works great, I can slice large HDF5 files over a high latency transport layer. However HDF5 appears to provide its own file locking functionality, so that if you open multiple files for read-only within the same process (threading model) it will still only run the operations, effectively, in series.

HDF5中有一些支持并行操作的驱动程序,例如 h5py.File(f,driver ='mpio'),但这似乎不适用于类似python文件的对象,使用 h5py.File(f,driver ='fileobj').

There are drivers in HDF5 that support parallel operations, such as h5py.File(f, driver='mpio'), but this doesn't appear to apply to python file-like objects which use h5py.File(f, driver='fileobj').

我看到的唯一解决方案是使用多重处理.但是,可伸缩性非常有限,由于开销,您只能实际打开10个进程.我的传输层使用asyncio,并且能够并行进行1,000或10,000规模的操作,这使我可以建立更长的慢速文件读取操作队列,从而提高总吞吐量.

The only solution I see is to use multiprocessing. However the scalability is very limited, you can only realistically open 10's of processes because of overhead. My transport layer uses asyncio and is capable of parallel operations on the scale of 1,000's or 10,000's, allowing me to build a longer queue of slow file-read operations which boost my total throughput.

当我将10k IO操作并行排队(需要50GB RAM来处理请求时,我可以通过本地S3接口通过传输层实现1.5 GB/秒的大文件,随机查找,二进制读取)吞吐量的权衡).

I can achieve 1.5 GB/sec of large-file, random-seek, binary reads with my transport layer against a local S3 interface when I queue 10k IO ops in parallel (requiring 50GB of RAM to service the requests, an acceptable trade-off for the throughput).

使用 driver ='fileobj'时,有什么方法可以禁用h5py文件锁定吗?

Is there any way I can disable the h5py file locking when using driver='fileobj'?

推荐答案

您只需将环境变量 HDF5_USE_FILE_LOCKING 的值设置为 FALSE .

You just need to set the value to FALSE for the environment variable HDF5_USE_FILE_LOCKING.

示例如下:

在Linux或MacOS上通过终端: export HDF5_USE_FILE_LOCKING = FALSE

In Linux or MacOS via Terminal: export HDF5_USE_FILE_LOCKING=FALSE

在Windows中通过命令提示符(CMD):设置HDF5_USE_FILE_LOCKING = FALSE

In Windows via Command Prompts (CMD): set HDF5_USE_FILE_LOCKING=FALSE

这篇关于我们可以为类似python文件的对象禁用h5py文件锁定吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆