如何找到特定行的一个文件中的字节位置 [英] How to find the byte position of specific line in a file

查看:247
本文介绍了如何找到特定行的一个文件中的字节位置的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

什么是查找文件中的某一行的字节位置以最快的方式,在命令行?

What's the fastest way to find the byte position of a specific line in a file, from the command line?

例如

$ linepos myfile.txt 13
5283

我正在写一个CSV那几个GB的大小,而在分析器停止的情况下,我希望能够从最后一个位置恢复解析器。解析器是在Python,但即使遍历 file.readlines()需要很长的时间,因为有几百万行的文件中。我想根本就 file.seek(INT(command.getoutput(linepos myfile.txt的%I%LASTROW))),但我找不到一个shell命令有效地做到这一点。

I'm writing a parser for a CSV that's several GB in size, and in the event the parser is halted, I'd like to be able to resume from the last position. The parser is in Python, but even iterating over file.readlines() takes a long time, since there are millions of rows in the file. I'd like to simply do file.seek(int(command.getoutput("linepos myfile.txt %i" % lastrow))), but I can't find a shell command to efficiently do this.

编辑:很抱歉的混乱,但我正在寻找一个非Python的解决方案。我已经知道如何从Python中做到这一点。

Sorry for the confusion, but I'm looking for a non-Python solution. I already know how to do this from Python.

推荐答案

从@ chepner的评论:

From @chepner's comment on my other answer:

position = 0  # or wherever you left off last time
try:
    with open('myfile.txt') as file:
        file.seek(position)  # zero in base case
        for line in file:
            position = file.tell() # current seek position in file
            # process the line
except:
    print 'exception occurred at position {}'.format(position)
    raise

这篇关于如何找到特定行的一个文件中的字节位置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆