用lua返回一个字符串的偏移量 [英] return offset of a string with lua

查看:275
本文介绍了用lua返回一个字符串的偏移量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图搜索相当大的文件的某个字符串,并返回其偏移量。我是新来的lua,我目前的做法是这样的:

  linenumber = 0 
for line in io .lines(filepath)do
result = string.find(line,ABC,1)
linenumber = linenumber + 1

if result = = nil then
offset = linenumber * 4096 + result
io.close
end
end



我意识到这种方式相当原始,当然也很慢。我怎样才能更有效地做到这一点?



感谢您的宝贵意见。

解决方案

<如果文件不是太大,并且可以省掉内存,只需要在整个文件中使用 string.find 即可。如果没有,你可以通过块搜索文件。



你的方法并不是那么糟糕。我建议加载文件在重叠块,但。重叠避免了在块之间分割模式,并且不被注意到:

 .... .... BC ..

我的实现是这样的:

  size = 4096  - 注意,大小应该比pat的长度大。 
pat =ABC
overlap =#pat
fh = io.open(filepath,'rb') - 在windows上,不要忘记b
block = fh :read(size + overlap)
n = 0
while block
block_offset = block:find(pat)
if block_offset then
print(block_offset)
offset = block_offset + size * n
break
end
fh:seek('cur', - overlap)
cur = fh:seek'cur'
block = fh:读取(大小+重叠)
n = n + 1
结束

如果抵消然后
print('发现模式在',偏移量' ',n,'blocks')
else
print('did not find pattern')
end

如果你的文件真的有行,你也可以使用此处 Lua书籍中的本节解释了读取文件的一些性能考虑。


I'm trying to search rather big files for a certain string and return its offset. I'm new to lua and my current approach would look like this:

linenumber = 0
for line in io.lines(filepath) do
result=string.find(line,"ABC",1)
linenumber = linenumber+1

if result ~= nil then
offset=linenumber*4096+result
io.close
end
end

I realize that this way is rather primitive and certainly slow. How could I do this more efficiently?

Thanks in advance.

解决方案

If the file is not too big, and you can spare the memory, it's faster to just slurp in the whole file and just use string.find. If not you can search the file by block.

Your approach isn't all that bad. I'd suggest loading the file in overlapping blocks though. The overlap avoids having the pattern split just between the blocks and going unnoticed like:

".... ...A BC.. ...."

My implementation goes like this:

size=4096 -- note, size should be bigger than the length of pat to work.
pat="ABC"
overlap=#pat
fh=io.open(filepath,'rb') -- On windows, do NOT forget the b
block=fh:read(size+overlap)
n=0
while block do
    block_offset=block:find(pat)
    if block_offset then
        print(block_offset)
        offset=block_offset+size*n
        break
    end
    fh:seek('cur',-overlap)
    cur=fh:seek'cur'
    block=fh:read(size+overlap)
    n=n+1
end

if offset then
    print('found pattern at', offset, 'after reading',n,'blocks')
else
    print('did not find pattern')
end

If your file really has lines, you can also use the trick explained here. This section in the Programming in Lua book explains some performance considerations reading files.

这篇关于用lua返回一个字符串的偏移量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆