如何使用“打开"打开多个文件(事先未知的文件数)?陈述? [英] How can I open multiple files (number of files unknown beforehand) using "with open" statement?

查看:35
本文介绍了如何使用“打开"打开多个文件(事先未知的文件数)?陈述?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我特别需要使用 with open 语句来打开文件,因为我需要一起打开几百个文件并使用 K-way 合并来合并它们.我明白,理想情况下我应该保持低 K,但我没有预见到这个问题.

I specifically need to use with open statement for opening the files, because I need to open a few hundred files together and merge them using K-way merge. I understand, ideally I should have kept K low, but I did not foresee this problem.

现在不能从头开始,因为我有一个截止日期要满足.所以在这一点上,我需要非常快速的 I/O,它不会将文件的整个/很大一部分存储在内存中(因为有数百个文件,每个大约 10MB).对于 K-way 合并,我只需要一次阅读一行.减少内存使用量是我目前的首要任务.

Starting from scratch is not an option now as I have a deadline to meet. So at this point, I need very fast I/O that does not store the whole/huge portion of file in memory (because there are hundreds of files, each of ~10MB). I just need to read one line at a time for K-way merge. Reducing memory usage is my primary focus right now.

我了解到 with open 是最有效的技术,但我无法理解如何在一个 with open 中将所有文件一起open> 声明.原谅我的初学者无知!

I learned that with open is the most efficient technique, but I cannot understand how to open all the files together in a single with open statement. Excuse my beginner ignorance!

更新:此问题已解决.事实证明,问题根本不在于我是如何打开文件的.我发现过多的内存使用是由于垃圾收集效率低下.我根本没有使用 with open .我使用了常规的 f=open()f.close().垃圾收集拯救了这一天.

Update: This problem was solved. It turns out the issue was not about how I was opening the files at all. I found out that the excessive memory usage was due to inefficient garbage collection. I did not use with open at all. I used the regular f=open() and f.close(). Garbage collection saved the day.

推荐答案

使用内置的 contextmanger 函数装饰器,用于定义with 语句的工厂函数上下文管理器"正如文档所说.例如:

It's fairly easy to write your own context manager to handle this by using the built-in contextmanger function decorator to define "a factory function for with statement context managers" as the documentation puts it. For example:

from contextlib import contextmanager

@contextmanager
def multi_file_manager(files, mode='rt'):
    """ Open multiple files and make sure they all get closed. """
    files = [open(file, mode) for file in files]
    yield files
    for file in files:
        file.close()


if __name__ == '__main__':

    filenames = 'file1', 'file2', 'file3'

    with multi_file_manager(filenames) as files:
        a = files[0].readline()
        b = files[2].readline()
            ...


如果您事先不知道所有文件,那么创建一个上下文管理器也同样容易,该管理器支持随上下文增量添加它们.在下面的代码中,一个 contextlib.ContextDecorator 用作基类以简化 MultiFileManager 类的实现.

from contextlib import ContextDecorator

class MultiFileManager(ContextDecorator):
    def __init__(self, files=None):
        self.files = [] if files is None else files

    def __enter__(self):
        return self

    def __exit__(self, exc_type, exc_val, exc_tb):
        for file in self.files:
            file.close()

    def __iadd__(self, other):
        """Add file to be closed when leaving context."""
        self.files.append(other)
        return self


if __name__ == '__main__':

    filenames = 'mfm_file1.txt', 'mfm_file2.txt', 'mfm_file3.txt'

    with MultiFileManager() as mfmgr:
        for count, filename in enumerate(filenames, start=1):
            file = open(filename, 'w')
            mfmgr += file  # Add file to be closed later.
            file.write(f'this is file {count}
')

这篇关于如何使用“打开"打开多个文件(事先未知的文件数)?陈述?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆