迭代多个文件的所有行的最pythonic方法是什么? [英] What's the most pythonic way to iterate over all the lines of multiple files?

查看:27
本文介绍了迭代多个文件的所有行的最pythonic方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将许多文件视为一个文件.使用生成器使用 [filenames] => [file objects] => [lines]/不将整个文件读入内存的正确 pythonic 方法是什么?

我们都知道打开文件的正确方法:

with open("auth.log", "rb") as f:打印总和(f.readlines())

而且我们知道将多个迭代器/生成器链接成一个长迭代器/生成器的正确方法:

<预><代码>>>>列表(itertools.chain(范围(3),范围(3)))[0, 1, 2, 0, 1, 2]

但是如何将多个文件链接在一起并保留上下文管理器?

 with open("auth.log", "rb") 作为 f0:使用 open("auth.log.1", "rb") 作为 f1:对于 itertools.chain(f0, f1) 中的行:do_stuff_with(行)# f1 现已关闭# f0 现已关闭# 总的

我可以忽略上下文管理器并执行类似的操作,但感觉不对:

files = itertools.chain(*(open(f, "rb") for f in file_names))对于文件中的行:do_stuff_with(行)

或者这就是 异步 IO - PEP 3156 的用途,我只是以后要等优雅的语法吗?

解决方案

文件输入.

 for line in fileinput.input(filenames):...

阅读 然而,fileinput.FileInput 不能用作上下文管理器1.要解决这个问题,您可以使用 contextlib.closure 因为FileInput 实例有一个合理实现的 close 方法:

from contextlib 导入关闭与关闭(文件输入.输入(文件名))作为 line_iter:对于 line_iter 中的行:...

<小时>

上下文管理器的另一种选择是编写一个简单的函数,循环遍历文件并在运行时生成行:

def 文件输入(文件):对于文件中的 f:以 open(f,'r') 作为鳍:对于鳍线:屈服线

这里不需要 itertools.chain 恕我直言......这里的魔法在于 yield 语句,它用于将普通函数转换为非常懒惰的生成器.

<小时>

1顺便说一句,从 python3.2 开始,fileinput.FileInput 作为上下文管理器实现的,这正是我们所做的之前使用 contextlib.现在我们的例子变成:

# Python 3.2+ 版本使用 fileinput.input(filenames) 作为 line_iter:对于 line_iter 中的行:...

虽然另一个例子也适用于 python3.2+.

I want to treat many files as if they were all one file. What's the proper pythonic way to take [filenames] => [file objects] => [lines] with generators/not reading an entire file into memory?

We all know the proper way to open a file:

with open("auth.log", "rb") as f:
    print sum(f.readlines())

And we know the correct way to link several iterators/generators into one long one:

>>> list(itertools.chain(range(3), range(3)))
[0, 1, 2, 0, 1, 2]

but how do I link multiple files together and preserve the context managers?

with open("auth.log", "rb") as f0:
    with open("auth.log.1", "rb") as f1:
        for line in itertools.chain(f0, f1):
            do_stuff_with(line)

    # f1 is now closed
# f0 is now closed
# gross

I could ignore the context managers and do something like this, but it doesn't feel right:

files = itertools.chain(*(open(f, "rb") for f in file_names))
for line in files:
    do_stuff_with(line)

Or is this kind of what Async IO - PEP 3156 is for and I'll just have to wait for the elegant syntax later?

解决方案

There's always fileinput.

for line in fileinput.input(filenames):
    ...

Reading the source however, it appears that fileinput.FileInput can't be used as a context manager1. To fix that, you could use contextlib.closing since FileInput instances have a sanely implemented close method:

from contextlib import closing
with closing(fileinput.input(filenames)) as line_iter:
    for line in line_iter:
        ...


An alternative with the context manager, is to write a simple function looping over the files and yielding lines as you go:

def fileinput(files):
    for f in files:
        with open(f,'r') as fin:
            for line in fin:
                yield line

No real need for itertools.chain here IMHO ... The magic here is in the yield statement which is used to transform an ordinary function into a fantastically lazy generator.


1As an aside, starting with python3.2, fileinput.FileInput is implemented as a context manager which does exactly what we did before with contextlib. Now our example becomes:

# Python 3.2+ version
with fileinput.input(filenames) as line_iter:
    for line in line_iter:
        ...

although the other example will work on python3.2+ as well.

这篇关于迭代多个文件的所有行的最pythonic方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆