我应该如何在 Python 中逐行读取文件? [英] How should I read a file line-by-line in Python?

查看:43
本文介绍了我应该如何在 Python 中逐行读取文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在史前时代(Python 1.4)我们做到了:

In pre-historic times (Python 1.4) we did:

fp = open('filename.txt')
while 1:
    line = fp.readline()
    if not line:
        break
    print line

在 Python 2.1 之后,我们做到了:

after Python 2.1, we did:

for line in open('filename.txt').xreadlines():
    print line

在我们在 Python 2.3 中获得方便的迭代器协议之前,可以这样做:

before we got the convenient iterator protocol in Python 2.3, and could do:

for line in open('filename.txt'):
    print line

我看过一些使用更详细的例子:

I've seen some examples using the more verbose:

with open('filename.txt') as fp:
    for line in fp:
        print line

这是未来的首选方法吗?

is this the preferred method going forwards?

[edit] 我知道 with 语句确保关闭文件...但是为什么文件对象的迭代器协议中没有包含它?

[edit] I get that the with statement ensures closing of the file... but why isn't that included in the iterator protocol for file objects?

推荐答案

之所以首选以下方案,只有一个原因:

There is exactly one reason why the following is preferred:

with open('filename.txt') as fp:
    for line in fp:
        print line

我们都被 CPython 相对确定性的垃圾收集引用计数方案宠坏了.其他假设的 Python 实现如果使用其他一些方案来回收内存,则在没有 with 块的情况下,不一定足够快"地关闭文件.

We are all spoiled by CPython's relatively deterministic reference-counting scheme for garbage collection. Other, hypothetical implementations of Python will not necessarily close the file "quickly enough" without the with block if they use some other scheme to reclaim memory.

在这样的实现中,如果您的代码打开文件的速度比垃圾收集器在孤立文件句柄上调用终结器的速度快,您可能会从操作系统收到打开的文件太多"错误.通常的解决方法是立即触发 GC,但这是一个令人讨厌的 hack,必须由每个可能遇到错误的函数来完成,包括库中的函数.真是噩梦.

In such an implementation, you might get a "too many files open" error from the OS if your code opens files faster than the garbage collector calls finalizers on orphaned file handles. The usual workaround is to trigger the GC immediately, but this is a nasty hack and it has to be done by every function that could encounter the error, including those in libraries. What a nightmare.

或者你可以只使用 with 块.

Or you could just use the with block.

(如果只对问题的客观方面感兴趣,请立即停止阅读.)

(Stop reading now if are only interested in the objective aspects of the question.)

为什么文件对象的迭代器协议中没有包含它?

Why isn't that included in the iterator protocol for file objects?

这是一个关于 API 设计的主观问题,所以我有两个部分的主观答案.

This is a subjective question about API design, so I have a subjective answer in two parts.

在直觉层面上,这感觉是错误的,因为它使迭代器协议做两件独立的事情——遍历行关闭文件句柄——制作一个看起来简单的东西通常是一个坏主意函数做两个动作.在这种情况下,感觉特别糟糕,因为迭代器以一种准功能的、基于值的方式与文件内容相关联,但管理文件句柄是一项完全独立的任务.将两者无形地压缩为一个动作,对于阅读代码的人来说是令人惊讶的,并且使得对程序行为的推理变得更加困难.

On a gut level, this feels wrong, because it makes iterator protocol do two separate things—iterate over lines and close the file handle—and it's often a bad idea to make a simple-looking function do two actions. In this case, it feels especially bad because iterators relate in a quasi-functional, value-based way to the contents of a file, but managing file handles is a completely separate task. Squashing both, invisibly, into one action, is surprising to humans who read the code and makes it more difficult to reason about program behavior.

其他语言基本上得出了相同的结论.Haskell 曾简要介绍过所谓的lazy IO",它允许您遍历文件并在到达流的末尾时自动关闭它,但现在几乎普遍不鼓励在 Haskell 中使用惰性 IO,而 Haskell用户大多转向更明确的资源管理,如 Conduit,其行为更像 Python 中的 with 块.

Other languages have essentially come to the same conclusion. Haskell briefly flirted with so-called "lazy IO" which allows you to iterate over a file and have it automatically closed when you get to the end of the stream, but it's almost universally discouraged to use lazy IO in Haskell these days, and Haskell users have mostly moved to more explicit resource management like Conduit which behaves more like the with block in Python.

在技术层面上,您可能希望在 Python 中使用文件句柄做一些事情,如果迭代关闭文件句柄,这些事情将无法正常工作.例如,假设我需要对文件进行两次迭代:

On a technical level, there are some things you may want to do with a file handle in Python which would not work as well if iteration closed the file handle. For example, suppose I need to iterate over the file twice:

with open('filename.txt') as fp:
    for line in fp:
        ...
    fp.seek(0)
    for line in fp:
        ...

虽然这是一个不太常见的用例,但请考虑这样一个事实,即我可能刚刚将底部的三行代码添加到了最初具有前三行的现有代码库中.如果迭代关闭文件,我将无法做到这一点.因此,将迭代和资源管理分开可以更轻松地将代码块组合成一个更大的、可运行的 Python 程序.

While this is a less common use case, consider the fact that I might have just added the three lines of code at the bottom to an existing code base which originally had the top three lines. If iteration closed the file, I wouldn't be able to do that. So keeping iteration and resource management separate makes it easier to compose chunks of code into a larger, working Python program.

可组合性是语言或 API 最重要的可用性特性之一.

Composability is one of the most important usability features of a language or API.

这篇关于我应该如何在 Python 中逐行读取文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆