Python:逐行读取文件的最佳方法 [英] Python: Most optimal way to read file line by line

查看:53
本文介绍了Python:逐行读取文件的最佳方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要读取一个很大的输入文件,所以我不想使用 enumerate fo.readlines().以传统方式中fo:的行将不起作用,我将说明原因,但是我觉得现在需要对此做一些修改.考虑以下文件:

I have a large input file I need to read from so I don't want to use enumerate or fo.readlines(). for line in fo: in the traditional way won't work and I'll state why, but I feel some modification to that is what I need right now. Consider the following file:

 input_file.txt:
 3 # No of tests that will follow
 3 # No of points in current test
 1 # 1st x-coordinate
 2 # 2nd x-coordinate
 3 # 3rd x-coordinate
 2 # 1st y-coordinate
 4 # 2nd y-coordinate
 6 # 3rd y-coordinate
 ...

我需要的是能够读取可变的行块,将元组中的坐标配对,将元组添加到案例列表中,然后再返回以从文件中读取新案例.

What I need is to be able to read variable chunks of lines, pair the coordinates in tuple, add tuple to a list of cases and move back to reading a new case from the file.

我想到了这个

with open(input_file) as f:
    T = int(next(f)) 
    for _ in range(T):
        N = int(next(f))
        for i in range(N):
            x.append(int(f.next()))
        for i in range(N):
            y.append(int(f.next()))

然后将两个列表耦合为一个元组.我觉得必须有一种更清洁的方法来做到这一点.有什么建议吗?

Then couple the two lists into a tuple. I feel there must be a cleaner way to do this. Any suggestions?

y坐标必须有一个单独的for循环才能读取.它们的x和y坐标相隔n行.所以读第一行;读行(i + n);重复n次-对于每种情况.

The y-coordinates will have to have a separate for loop to read. They are x and y coordinates are n lines apart. So Read line i; Read line (i+n); Repeat n times - for each case.

推荐答案

这可能不是最短的解决方案,但我认为它是相当理想的".

This might not be the shortest possible solution but I believe it is "pretty optimal".

def parse_number(stream):
    return int(next(stream).partition('#')[0].strip())

def parse_coords(stream, count):
    return [parse_number(stream) for i in range(count)]

def parse_test(stream):
    count = parse_number(stream)
    return list(zip(parse_coords(stream, count), parse_coords(stream, count)))

def parse_file(stream):
    for i in range(parse_number(stream)):
        yield parse_test(stream)

它会急切地解析单个测试的所有坐标,但是每个测试只会按照您的要求进行延迟解析.

It will eagerly parse all coordinates of a single test but each test will only be parsed lazily as you ask for it.

您可以像这样使用它来遍历测试:

You can use it like this to iterate over the tests:

if __name__ == '__main__':
    with open('input.txt') as istr:
        for test in parse_file(istr):
            print(test)

可能需要更好的功能名称,以更好地区分急切功能和懒惰功能.我现在缺乏命名创造力.

Better function names might be desired to better distinguish eager from lazy functions. I'm experiencing a lack of naming creativity right now.

这篇关于Python:逐行读取文件的最佳方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆