有效的“尾巴”实施 [英] efficient 'tail' implementation

查看:87
本文介绍了有效的“尾巴”实施的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述




我有一个非常大的文件,例如超过200Mb,我将使用

python来编写尾巴代码。

命令获取文件的最后几行。什么是一个好的算法

这个类型的任务在python中用于非常大的文件?

最初,我想把所有内容从文件中读入数组

并且只获得最后几个元素(行),但因为它是一个非常大的

文件,所以不要认为是有效的。

谢谢

解决方案



s99999999s2 ... @ yahoo.com写道:



我有一个非常大的文件,例如超过200Mb,我将使用
python编写一个tail
命令来获取最后几行的文件。对于非常大的文件,python中这类任务的优秀算法是什么?
最初,我想把所有内容都从文件中读入数组
并获得最后几个元素(但是因为它是一个非常大的文件,所以不要认为是有效的。
谢谢



我不认为这是一个python特定的问题但是一般的问题

allfile as byte stream系统。问题是,线是线。不是该文件的

属性,但其内容(某些大型铁系统使用

记录为行,可以用O(1)来解决)


所以最简单的就是读取和下降直到你想要的那个。


for x in f:

如果x_is_what_I_want:某事


如果你真的想要,你可以这样做反向查询:


f.seek(0,EOF)

x = f.tell()


然后逐字节循环,直到找到你的东西。这非常麻烦,可能不会更快,具体取决于你的内容。


s9 ************ @ yahoo.com 写道:

I有一个非常大的文件,例如超过200Mb,我将使用
python编写一个tail
命令来获取文件的最后几行。对于非常大的文件,python中这类任务的优秀算法是什么?
最初,我想把所有内容都从文件中读入数组
并获得最后几个元素(但是因为它是一个非常大的文件,不要认为是有效的。




嗯,200mb并不是那么大这些日子。但它很容易编码:


#未经测试的代码

input = open(filename)

tail = input。 readlines()[:tailcount]

input.close()


你完成了。但是,它会经历大量的记忆。最快的

可能会向后反复,但可能会花费多个
尝试获得你想要的一切:


#untested代码

输入=打开(文件名)

blocksize = tailcount * expected_line_length

tail = []

而len (尾巴)< tailcount:

input.seek(-blocksize,EOF)

tail = input.read()。split(''\ n'')

blocksize * = 2

input.close()

tail = tail [:tailcount]


它可能是更有效地向后读取块并将它们粘贴在一起,但是我不打算进入它们。


< mike

-

Mike Meyer< mw*@mired.org> http://www.mired.org/home/mwm/

独立的WWW / Perforce / FreeBSD / Unix顾问,电子邮件以获取更多信息。




Mike Meyer写道:< blockquote class =post_quotes>向后读取块并将它们粘贴在一起可能会更有效,但我不打算进入。



这实际上是个好主意。只需反转缓冲区并执行

拆分,最后一行成为第一行,依此类推。然后逻辑

与从文件开头读取没有什么不同。只需要

保持最后的半行反向缓冲区如果想要的那个

恰好跨越缓冲区边界。


hi

I have a file which is very large eg over 200Mb , and i am going to use
python to code a "tail"
command to get the last few lines of the file. What is a good algorithm
for this type of task in python for very big files?
Initially, i thought of reading everything into an array from the file
and just get the last few elements (lines) but since it''s a very big
file, don''t think is efficient.
thanks

解决方案


s99999999s2...@yahoo.com wrote:

hi

I have a file which is very large eg over 200Mb , and i am going to use
python to code a "tail"
command to get the last few lines of the file. What is a good algorithm
for this type of task in python for very big files?
Initially, i thought of reading everything into an array from the file
and just get the last few elements (lines) but since it''s a very big
file, don''t think is efficient.
thanks


I don''t think this is a python specific issue but a generic problem for
all "file as byte stream" system. The problem is, "line" is not a
property of the file, but its content(some big iron system use
"records" for lines and can be addressed with O(1))

So the simplest is just read and drop until the one you want.

for x in f:
if x_is_what_I_want: something

If you really want, you can do the reverse lookup like this :

f.seek(0,EOF)
x = f.tell()

then loop byte by byte backward till you find you stuff. The is quite
cumbersome and may not be faster, depending on your content.


s9************@yahoo.com writes:

I have a file which is very large eg over 200Mb , and i am going to use
python to code a "tail"
command to get the last few lines of the file. What is a good algorithm
for this type of task in python for very big files?
Initially, i thought of reading everything into an array from the file
and just get the last few elements (lines) but since it''s a very big
file, don''t think is efficient.



Well, 200mb isn''t all that big these days. But it''s easy to code:

# untested code
input = open(filename)
tail = input.readlines()[:tailcount]
input.close()

and you''re done. However, it will go through a lot of memory. Fastest
is probably working through it backwards, but that may take multiple
tries to get everything you want:

# untested code
input = open(filename)
blocksize = tailcount * expected_line_length
tail = []
while len(tail) < tailcount:
input.seek(-blocksize, EOF)
tail = input.read().split(''\n'')
blocksize *= 2
input.close()
tail = tail[:tailcount]

It would probably be more efficient to read blocks backwards and paste
them together, but I''m not going to get into that.

<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.



Mike Meyer wrote:

It would probably be more efficient to read blocks backwards and paste
them together, but I''m not going to get into that.


That actually is a pretty good idea. just reverse the buffer and do a
split, the last line becomes the first line and so on. The logic then
would be no different than reading from beginning of file. Just need to
keep the last "half line" of the reversed buffer if the wanted one
happens to be across buffer boundary.


这篇关于有效的“尾巴”实施的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆