如何从Python的文件结尾开始读取文件 [英] How to read lines from a file in python starting from the end

查看:245
本文介绍了如何从Python的文件结尾开始读取文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要知道如何从Python中的文件中读取行,以便我先读取最后一行,然后继续,直到光标到达文件的开头。有任何想法吗?

解决方案

这个问题的一般方法是以相反的顺序读取文本文件,至少可以解决三个问题方法。

一般的问题是由于每一行可以有不同的长度,所以你不能事先知道每一行在文件中的起始位置,也不知道它们中有多少行有。这意味着你需要应用一些逻辑来解决问题。

一般方法#1:将整个文件读入内存

使用这种方法,只需将整个文件读入内存,然后在一些数据结构中,随后就可以反向处理行列表。一个堆栈,一个双向链表,甚至一个数组都可以做到这一点。



优点:真的很容易实现所有我知道)

缺点:使用大量的内存,可能需要一段时间才能读取大文件

方法#2:读取整个文件,存储行的位置

使用这种方法,您还可以读取整个文件一次,而不是存储整个文件文本)在内存中,只存储每行开始的文件内的二进制位置。您可以将这些位置存储在与第一种方法中存储行相似的数据结构中。

无论您想要读取X行,都必须重新读取从文件开始的行,从你存储的位置开始。



优点:几乎与第一种方法

缺点:可能需要一段时间才能读取大文件

一般方法#3:读取文件反过来,然后找出来。

使用这种方法,您将从结尾读取文件块或类似文件,并查看结尾的位置。你基本上有一个缓冲区,例如4096字节,并处理该缓冲区的最后一行。当你的处理必须在缓冲区中一次向后移动一行时,你需要从读取的第一个缓冲区之前的区域读取另一个缓冲区的数据,然后继续处理。

这种方法通常比较复杂,因为您需要处理诸如两行缓冲区之间的行,而长行甚至可以覆盖两个以上的缓冲区。 p>

然而,它需要最少的内存,而对于真正大的文件,这可能也是值得的,以避免先读取千兆字节的信息。
$ b

优点:使用小内存,不需要先读取整个文件

缺点:很难实现,适合所有的角落案例






网络上有很多链接显示如何做第三种方法:


I need to know how to read lines from a file in python so that I read the last line first and continue in that fashion until the cursor reach's the beginning of the file. Any idea's?

解决方案

The general approach to this problem, reading a text file in reverse, line-wise, can be solved by at least three methods.

The general problem is that since each line can have a different length, you can't know beforehand where each line starts in the file, nor how many of them there are. This means you need to apply some logic to the problem.

General approach #1: Read the entire file into memory

With this approach, you simply read the entire file into memory, in some data structure that subsequently allows you to process the list of lines in reverse. A stack, a doubly linked list, or even an array can do this.

Pros: Really easy to implement (probably built into Python for all I know)
Cons: Uses a lot of memory, can take a while to read large files

General approach #2: Read the entire file, store position of lines

With this approach, you also read through the entire file once, but instead of storing the entire file (all the text) in memory, you only store the binary positions inside the file where each line started. You can store these positions in a similar data structure as the one storing the lines in the first approach.

Whever you want to read line X, you have to re-read the line from the file, starting at the position you stored for the start of that line.

Pros: Almost as easy to implement as the first approach
Cons: can take a while to read large files

General approach #3: Read the file in reverse, and "figure it out"

With this approach you will read the file block-wise or similar, from the end, and see where the ends are. You basically have a buffer, of say, 4096 bytes, and process the last line of that buffer. When your processing, which has to move one line at a time backward in that buffer, comes to the start of the buffer, you need to read another buffer worth of data, from the area before the first buffer you read, and continue processing.

This approach is generally more complicated, because you need to handle such things as lines being broken over two buffers, and long lines could even cover more than two buffers.

It is, however, the one that would require the least amount of memory, and for really large files, it might also be worth doing this to avoid reading through gigabytes of information first.

Pros: Uses little memory, does not require you to read the entire file first
Cons: Much hard to implement and get right for all corner cases


There are numerous links on the net that shows how to do the third approach:

这篇关于如何从Python的文件结尾开始读取文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆