Python最快访问文件中的行 [英] Python fastest access to line in file

查看:116
本文介绍了Python最快访问文件中的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在文件中有一个ASCII表,我想从中读取一组特定的行(例如,行4003到4005)。问题是这个文件可能非常长(例如100到数千行),我想尽快做到这一点。

I have an ASCII table in a file from which I want to read a particular set of lines (e.g. lines 4003 to 4005). The issue is that this file could be very very long (e.g. 100's of thousands to millions of lines), and I'd like to do this as quickly as possible.

错误的解决方案:读入整个文件,然后转到这些行,

Bad Solution: Read in the entire file, and go to those lines,

f = open('filename')
lines = f.readlines()[4003:4005]

<强>更好的解决方案:枚举每行,这样它就不会全部存储在内存中( a la https://stackoverflow.com/a/2081880/230468

Better Solution: enumerate over each line so that it's not all in memory (a la https://stackoverflow.com/a/2081880/230468)

f = open('filename')
lines = []
for i, line in enumerate(f):
    if i >= 4003 and i <= 4005: lines.append(line)
    if i > 4005: break                                    # @Wooble

最佳解决方案?

但这仍需要通过每一行。是否有更好的(在速度/效率方面)访问特定线路的方法?我应该使用 linecache 即使我只会访问该文件一次(通常)?

But this still requires going through each-line. Is there a better (in terms of speed/efficiency) method of accessing a particular line? Should I use a linecache even though I will only access the file once (typically)?

使用二进制文件代替,在这种情况下可能更容易跳过,是一个选项---但是我宁愿避免它。

Using a binary file instead, in which case it might be easier to skip-ahead, is an option --- but I'd much rather avoid it.

推荐答案

我可能会使用 itertools.islice 。在像文件句柄这样的迭代上使用islice意味着整个文件永远不会被读入内存,并且尽可能快地丢弃前4002行。您甚至可以非常便宜地将您需要的两条线投射到一个列表中(假设线条本身不是很长)。然后你可以退出块,关闭文件句柄。

I would probably just use itertools.islice. Using islice over an iterable like a file handle means the whole file is never read into memory, and the first 4002 lines are discarded as quickly as possible. You could even cast the two lines you need into a list pretty cheaply (assuming the lines themselves aren't very long). Then you can exit the with block, closing the filehandle.

from itertools import islice
with open('afile') as f:
    lines = list(islice(f, 4003, 4005))
do_something_with(lines)



更新



但是,对于多次访问,圣牛的线路缓存速度更快。我创建了一个百万行文件来比较islice和linecache以及linecache将它吹走。

Update

But holy cow is linecache faster for multiple accesses. I created a million-line file to compare islice and linecache and linecache blew it away.

>>> timeit("x=islice(open('afile'), 4003, 4005); print next(x) + next(x)", 'from itertools import islice', number=1)
4003
4004

0.00028586387634277344
>>> timeit("print getline('afile', 4003) + getline('afile', 4004)", 'from linecache import getline', number=1)
4002
4003

2.193450927734375e-05

>>> timeit("getline('afile', 4003) + getline('afile', 4004)", 'from linecache import getline', number=10**5)
0.14125394821166992
>>> timeit("''.join(islice(open('afile'), 4003, 4005))", 'from itertools import islice', number=10**5)
14.732316970825195



不断重新导入并重新读取文件:



这是不是一个实际测试,但即使在每一步重新导入linecache它只比islice慢一秒。

Constantly re-importing and re-reading the file:

This is not a practical test, but even re-importing linecache at each step it's only a second slower than islice.

>>> timeit("from linecache import getline; getline('afile', 4003) + getline('afile', 4004)", number=10**5)
15.613967180252075



结论



是的,linecache比islice快,但不断重新创建linecache ,但是谁做到了?对于可能的场景(只读取几行,一次,读取多行,一次),linecache更快,并提供简洁的语法,但 islice 语法非常干净,也快,并且永远不会将整个文件读入内存。在RAM紧密的环境中, islice 解决方案可能是正确的选择。对于非常高的速度要求,linecache可能是更好的选择。但实际上,在大多数环境中,两次都足够小,几乎无关紧要。

Conclusion

Yes, linecache is faster than islice for all but constantly re-creating the linecache, but who does that? For the likely scenarios (reading only a few lines, once, and reading many lines, once) linecache is faster and presents a terse syntax, but the islice syntax is quite clean and fast as well and doesn't ever read the whole file into memory. On a RAM-tight environment, the islice solution may be the right choice. For very high speed requirements, linecache may be the better choice. Practically, though, in most environments both times are small enough it almost doesn't matter.

这篇关于Python最快访问文件中的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆