不读取Python中文件的最后N行的简单方法 [英] Simple Way of NOT reading last N lines of a file in Python

查看:83
本文介绍了不读取Python中文件的最后N行的简单方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

除最后N行外,我想逐行读取文件.在Python中,如何知道在不到达文件末尾并回溯/丢弃最后N行的情况下停止的位置?是否要求#条线= X,并进行循环(X-N)是解决此问题的好方法?

I'd like to read a file line by line, except for the last N lines. How do I know where to stop, without reaching the end of the file and back tracking / discarding the last N lines, in Python? Is asking for # lines = X, and looping (X-N) a good way to go about this?

最简单/最Python化的方法是什么?

What's the simplest / most Pythonic way of doing this?

推荐答案

三种不同的解决方案:

1)快速又肮脏,请参阅约翰的回答:

1) Quick and dirty, see John's answer:

with open(file_name) as fid:
    lines = fid.readlines()
for line in lines[:-n_skip]:
    do_something_with(line)

此方法的缺点是您必须先读取内存中的所有行,这对于大文件可能是个问题.

The disadvantage of this method is that you have to read all lines in memory first, which might be a problem for big files.

2)两遍

对文件进行两次处理,一次计算行数n_lines,在第二遍处理中仅处理第一行n_lines - n_skip:

Process the file twice, once to count the number of lines n_lines, and in a second pass process only the first n_lines - n_skip lines:

# first pass to count
with open(file_name) as fid:
    n_lines = sum(1 for line in fid)

# second pass to actually do something
with open(file_name) as fid:
    for i_line in xrange(n_lines - n_skip):  # does nothing if n_lines <= n_skip
        line = fid.readline()
        do_something_with(line)

此方法的缺点是您必须对文件进行两次迭代,这在某些情况下可能会变慢.不过,好处是,您的内存永远不会超过一行.

The disadvantage of this method is that you have to iterate over the file twice, which might be slower in some cases. The good thing, however, is that you never have more than one line in memory.

3)使用缓冲区,类似于Serge的解决方案

3) Use a buffer, similar to Serge's solution

如果只想遍历文件一次,则只有在知道行i + n_skip存在的情况下,才能确定可以处理行i.这意味着您必须先在临时缓冲区中保留n_skip行.一种实现方法是实现某种FIFO缓冲区(例如,使用实现循环缓冲区的生成器函数):

In case you want to iterate over the file just once, you only know for sure that you can process line i if you know that line i + n_skip exists. This means that you have to keep n_skip lines in a temporary buffer first. One way to do this is to implement some sort of FIFO buffer (e.g. with a generator function that implements a circular buffer):

def fifo(it, n):
    buffer = [None] * n  # preallocate buffer
    i = 0
    full = False
    for item in it:  # leaves last n items in buffer when iterator is exhausted
        if full:
            yield buffer[i]  # yield old item before storing new item
        buffer[i] = item
        i = (i + 1) % n
        if i == 0:  # wrapped around at least once
            full = True

快速测试一系列数字:

In [12]: for i in fifo(range(20), 5):
    ...:     print i,
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

与文件一起使用的方式:

The way you would use this with your file:

with open(file_name) as fid:
    for line in fifo(fid, n_skip):
        do_something_with(line)

请注意,这需要足够的内存来临时存储n_skip行,但这仍然比在第一种解决方案中读取内存中的所有行要好.

Note that this requires enough memory to temporary store n_skip lines, but this is still better than reading all lines in memory as in the first solution.

这3种方法中最好的一种是在代码复杂性,内存和速度之间进行权衡,这取决于您的确切应用.

Which one of these 3 methods is the best is a trade-off between code complexity, memory and speed, which depends on your exact application.

这篇关于不读取Python中文件的最后N行的简单方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆