用lua返回一个字符串的偏移量 [英] return offset of a string with lua
问题描述
linenumber = 0
for line in io .lines(filepath)do
result = string.find(line,ABC,1)
linenumber = linenumber + 1
if result = = nil then
offset = linenumber * 4096 + result
io.close
end
end
我意识到这种方式相当原始,当然也很慢。我怎样才能更有效地做到这一点?
感谢您的宝贵意见。
<如果文件不是太大,并且可以省掉内存,只需要在整个文件中使用 string.find
即可。如果没有,你可以通过块搜索文件。
你的方法并不是那么糟糕。我建议加载文件在重叠块,但。重叠避免了在块之间分割模式,并且不被注意到:
.... .... BC ..
我的实现是这样的:
size = 4096 - 注意,大小应该比pat的长度大。
pat =ABC
overlap =#pat
fh = io.open(filepath,'rb') - 在windows上,不要忘记b
block = fh :read(size + overlap)
n = 0
while block
block_offset = block:find(pat)
if block_offset then
print(block_offset)
offset = block_offset + size * n
break
end
fh:seek('cur', - overlap)
cur = fh:seek'cur'
block = fh:读取(大小+重叠)
n = n + 1
结束
如果抵消然后
print('发现模式在',偏移量' ',n,'blocks')
else
print('did not find pattern')
end
如果你的文件真的有行,你也可以使用此处。 Lua书籍中的本节解释了读取文件的一些性能考虑。
I'm trying to search rather big files for a certain string and return its offset. I'm new to lua and my current approach would look like this:
linenumber = 0
for line in io.lines(filepath) do
result=string.find(line,"ABC",1)
linenumber = linenumber+1
if result ~= nil then
offset=linenumber*4096+result
io.close
end
end
I realize that this way is rather primitive and certainly slow. How could I do this more efficiently?
Thanks in advance.
If the file is not too big, and you can spare the memory, it's faster to just slurp in the whole file and just use string.find
. If not you can search the file by block.
Your approach isn't all that bad. I'd suggest loading the file in overlapping blocks though. The overlap avoids having the pattern split just between the blocks and going unnoticed like:
".... ...A BC.. ...."
My implementation goes like this:
size=4096 -- note, size should be bigger than the length of pat to work.
pat="ABC"
overlap=#pat
fh=io.open(filepath,'rb') -- On windows, do NOT forget the b
block=fh:read(size+overlap)
n=0
while block do
block_offset=block:find(pat)
if block_offset then
print(block_offset)
offset=block_offset+size*n
break
end
fh:seek('cur',-overlap)
cur=fh:seek'cur'
block=fh:read(size+overlap)
n=n+1
end
if offset then
print('found pattern at', offset, 'after reading',n,'blocks')
else
print('did not find pattern')
end
If your file really has lines, you can also use the trick explained here. This section in the Programming in Lua book explains some performance considerations reading files.
这篇关于用lua返回一个字符串的偏移量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!