获取文本文件的第一行和最后一行最有效的方式是什么? [英] What is the most efficient way to get first and last line of a text file?

查看:218
本文介绍了获取文本文件的第一行和最后一行最有效的方式是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个文本文件,其中包含每行上的时间戳。我的目标是找到时间范围。所有的时间都是顺序的,所以第一行将是最早的时间,最后一行将是最新的时间。我只需要第一个也是最后一行。注意:这些文件的长度相对较大,每行约1-2百万行,我必须做这个几百个文件。

解决方案

io模块的文档

  with open(fname,'rb')as fh:
first = next(fh).decode()

fh.seek(-1024,2)
last = fh.readlines()[ - 1] .decode()

此处的变量值为1024:表示平均字符串长度。我只选择1024例如。如果您估计平均线路长度,则可以使用该值为2。



由于您不知道线路长度的可能上限,明显的解决方案是循环文件:

  for fh中的行:
pass
last =行

您不需要打扰二进制标志,您可以使用 open(fname)



ETA :由于您有很多文件可以正常工作,您可以创建一个使用 random.sample 的几个文件的几个样本,并运行这些代码来确定最后一行的长度。有一个先前的大值位置偏移(让我们说1 MB)。这将有助于您估算完整的价值。


I have a text file which contains a time stamp on each line. My goal is to find the time range. All the times are in order so the first line will be the earliest time and the last line will be the latest time. I only need the very first and very last line. What would be the most efficient way to get these lines in python?

Note: These files are relatively large in length, about 1-2 million lines each and I have to do this for several hundred files.

解决方案

docs for io module

with open(fname, 'rb') as fh:
    first = next(fh).decode()

    fh.seek(-1024, 2)
    last = fh.readlines()[-1].decode()

The variable value here is 1024: it represents the average string length. I choose 1024 only for example. If you have an estimate of average line length you could just use that value times 2.

Since you have no idea whatsoever about the possible upper bound for the line length, the obvious solution would be to loop over the file:

for line in fh:
    pass
last = line

You don't need to bother with the binary flag you could just use open(fname).

ETA: Since you have many files to work on, you could create a sample of couple of dozens of files using random.sample and run this code on them to determine length of last line. With an a priori large value of the position shift (let say 1 MB). This will help you to estimate the value for the full run.

这篇关于获取文本文件的第一行和最后一行最有效的方式是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆