获取文本文件的第一行和最后一行的最有效方法是什么? [英] What is the most efficient way to get first and last line of a text file?

查看:38
本文介绍了获取文本文件的第一行和最后一行的最有效方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个文本文件,每行都有一个时间戳.我的目标是找到时间范围.所有的时间都是按顺序排列的,所以第一行是最早的时间,最后一行是最晚的时间.我只需要第一行和最后一行.在 python 中获取这些行的最有效方法是什么?

I have a text file which contains a time stamp on each line. My goal is to find the time range. All the times are in order so the first line will be the earliest time and the last line will be the latest time. I only need the very first and very last line. What would be the most efficient way to get these lines in python?

注意:这些文件的长度都比较大,每个大约有 1-2 百万行,我必须对数百个文件执行此操作.

Note: These files are relatively large in length, about 1-2 million lines each and I have to do this for several hundred files.

推荐答案

io 模块的文档

with open(fname, 'rb') as fh:
    first = next(fh).decode()

    fh.seek(-1024, 2)
    last = fh.readlines()[-1].decode()

这里的变量值是1024:表示平均字符串长度.例如,我仅选择 1024.如果您有平均线长度的估计值,您可以使用该值乘以 2.

The variable value here is 1024: it represents the average string length. I choose 1024 only for example. If you have an estimate of average line length you could just use that value times 2.

由于您对行长度的可能上限一无所知,显而易见的解决方案是遍历文件:

Since you have no idea whatsoever about the possible upper bound for the line length, the obvious solution would be to loop over the file:

for line in fh:
    pass
last = line

你不需要为二进制标志而烦恼,你可以使用 open(fname).

You don't need to bother with the binary flag you could just use open(fname).

ETA:由于您有许多文件要处理,您可以使用 random.sample 创建几十个文件的样本,并在它们上运行此代码以确定最后一行的长度.具有先验大的位置偏移值(假设为 1 MB).这将帮助您估计完整运行的值.

ETA: Since you have many files to work on, you could create a sample of couple of dozens of files using random.sample and run this code on them to determine length of last line. With an a priori large value of the position shift (let say 1 MB). This will help you to estimate the value for the full run.

这篇关于获取文本文件的第一行和最后一行的最有效方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆