如何从python中的文件中从末尾开始读取行 [英] How to read lines from a file in python starting from the end

查看:26
本文介绍了如何从python中的文件中从末尾开始读取行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要知道如何在 python 中从文件中读取行,以便我首先读取最后一行并以这种方式继续,直到光标到达文件的开头.有任何想法吗?

I need to know how to read lines from a file in python so that I read the last line first and continue in that fashion until the cursor reach's the beginning of the file. Any idea's?

推荐答案

这个问题的一般做法,逆向、逐行读取文本文件,至少可以通过三种方法解决.

The general approach to this problem, reading a text file in reverse, line-wise, can be solved by at least three methods.

一般的问题是,由于每一行的长度可能不同,因此您无法事先知道文件中每一行的开始位置,也不知道有多少行.这意味着您需要对问题应用一些逻辑.

The general problem is that since each line can have a different length, you can't know beforehand where each line starts in the file, nor how many of them there are. This means you need to apply some logic to the problem.

使用这种方法,您只需将整个文件读入内存,采用某种数据结构,随后允许您反向处理行列表.堆栈、双向链表甚至数组都可以做到这一点.

With this approach, you simply read the entire file into memory, in some data structure that subsequently allows you to process the list of lines in reverse. A stack, a doubly linked list, or even an array can do this.

优点:非常容易实现(据我所知,可能内置于 Python 中)
缺点:占用大量内存,读取大文件可能需要一段时间

Pros: Really easy to implement (probably built into Python for all I know)
Cons: Uses a lot of memory, can take a while to read large files

使用这种方法,您还可以通读整个文件一次,但不是将整个文件(所有文本)存储在内存中,而是只将二进制位置存储在文件中每一行开始的位置.您可以将这些位置存储在与第一种方法中存储行的数据结构类似的数据结构中.

With this approach, you also read through the entire file once, but instead of storing the entire file (all the text) in memory, you only store the binary positions inside the file where each line started. You can store these positions in a similar data structure as the one storing the lines in the first approach.

无论您想读取第 X 行,都必须从文件中重新读取该行,从您存储的该行开头的位置开始.

Whever you want to read line X, you have to re-read the line from the file, starting at the position you stored for the start of that line.

优点:几乎和第一种方法一样容易实施
缺点:读取大文件可能需要一段时间

Pros: Almost as easy to implement as the first approach
Cons: can take a while to read large files

使用这种方法,您将按块或类似的方式从末尾读取文件,并查看末尾的位置.您基本上有一个缓冲区,例如 4096 字节,并处理该缓冲区的最后一行.当您的处理(必须在该缓冲区中一次向后移动一行)到达缓冲区的开头时,您需要从您读取的第一个缓冲区之前的区域读取另一个缓冲区的数据,然后继续处理.

With this approach you will read the file block-wise or similar, from the end, and see where the ends are. You basically have a buffer, of say, 4096 bytes, and process the last line of that buffer. When your processing, which has to move one line at a time backward in that buffer, comes to the start of the buffer, you need to read another buffer worth of data, from the area before the first buffer you read, and continue processing.

这种方法通常比较复杂,因为您需要处理诸如在两个缓冲区上断行之类的事情,而长行甚至可以覆盖两个以上的缓冲区.

This approach is generally more complicated, because you need to handle such things as lines being broken over two buffers, and long lines could even cover more than two buffers.

然而,它需要最少的内存,对于非常大的文件,这样做可能也是值得的,以避免首先读取千兆字节的信息.

It is, however, the one that would require the least amount of memory, and for really large files, it might also be worth doing this to avoid reading through gigabytes of information first.

优点:占用内存少,不需要您先读取整个文件
缺点:很难实施并适用于所有极端情况

Pros: Uses little memory, does not require you to read the entire file first
Cons: Much hard to implement and get right for all corner cases

网上有很多链接展示了如何做第三种方法:

There are numerous links on the net that shows how to do the third approach:

这篇关于如何从python中的文件中从末尾开始读取行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆