如何使用“打开"打开多个文件(事先未知的文件数)?陈述? [英] How can I open multiple files (number of files unknown beforehand) using "with open" statement?
问题描述
我特别需要使用 with open
语句来打开文件,因为我需要一起打开几百个文件并使用 K-way 合并来合并它们.我明白,理想情况下我应该保持低 K,但我没有预见到这个问题.
I specifically need to use with open
statement for opening the files, because I need to open a few hundred files together and merge them using K-way merge. I understand, ideally I should have kept K low, but I did not foresee this problem.
现在不能从头开始,因为我有一个截止日期要满足.所以在这一点上,我需要非常快速的 I/O,它不会将文件的整个/很大一部分存储在内存中(因为有数百个文件,每个大约 10MB).对于 K-way 合并,我只需要一次阅读一行.减少内存使用量是我目前的首要任务.
Starting from scratch is not an option now as I have a deadline to meet. So at this point, I need very fast I/O that does not store the whole/huge portion of file in memory (because there are hundreds of files, each of ~10MB). I just need to read one line at a time for K-way merge. Reducing memory usage is my primary focus right now.
我了解到 with open
是最有效的技术,但我无法理解如何在一个 with open
中将所有文件一起open
> 声明.原谅我的初学者无知!
I learned that with open
is the most efficient technique, but I cannot understand how to open
all the files together in a single with open
statement. Excuse my beginner ignorance!
更新:此问题已解决.事实证明,问题根本不在于我是如何打开文件的.我发现过多的内存使用是由于垃圾收集效率低下.我根本没有使用 with open
.我使用了常规的 f=open()
和 f.close()
.垃圾收集拯救了这一天.
Update: This problem was solved. It turns out the issue was not about how I was opening the files at all. I found out that the excessive memory usage was due to inefficient garbage collection. I did not use with open
at all. I used the regular f=open()
and f.close()
. Garbage collection saved the day.
推荐答案
使用内置的 contextmanger
函数装饰器,用于定义with
语句的工厂函数上下文管理器"正如文档所说.例如:
It's fairly easy to write your own context manager to handle this by using the built-in contextmanger
function decorator to define "a factory function for with
statement context managers" as the documentation puts it. For example:
from contextlib import contextmanager
@contextmanager
def multi_file_manager(files, mode='rt'):
""" Open multiple files and make sure they all get closed. """
files = [open(file, mode) for file in files]
yield files
for file in files:
file.close()
if __name__ == '__main__':
filenames = 'file1', 'file2', 'file3'
with multi_file_manager(filenames) as files:
a = files[0].readline()
b = files[2].readline()
...
如果您事先不知道所有文件,那么创建一个上下文管理器也同样容易,该管理器支持随上下文增量添加它们.在下面的代码中,一个 contextlib.ContextDecorator
用作基类以简化 MultiFileManager
类的实现.
from contextlib import ContextDecorator
class MultiFileManager(ContextDecorator):
def __init__(self, files=None):
self.files = [] if files is None else files
def __enter__(self):
return self
def __exit__(self, exc_type, exc_val, exc_tb):
for file in self.files:
file.close()
def __iadd__(self, other):
"""Add file to be closed when leaving context."""
self.files.append(other)
return self
if __name__ == '__main__':
filenames = 'mfm_file1.txt', 'mfm_file2.txt', 'mfm_file3.txt'
with MultiFileManager() as mfmgr:
for count, filename in enumerate(filenames, start=1):
file = open(filename, 'w')
mfmgr += file # Add file to be closed later.
file.write(f'this is file {count}
')
这篇关于如何使用“打开"打开多个文件(事先未知的文件数)?陈述?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!