在整个文件多行正则表达式搜索 [英] Multi-line regex search in whole file
问题描述
我发现的例子负载上使用正则表达式的文件替换文本。然而这一切都归结为两个版本:
1.迭代文件中的所有线路和应用正则表达式的每个单线
2.将整个文件。
I've found loads of examples on to to replace text in files using regex. However it all boils down to two versions:
1. Iterate over all lines in the file and apply regex to each single line
2. Load the whole file.
没有。 2是不可行使用我的文件 - 他们即将2GiB ...
至于1号:目前,这是我的做法,但我不知道......如果需要申请一个正则表达式跨越多行?
No. 2 Is not feasible using "my" files - they're about 2GiB...
As to No. 1: Currently this is my approach, however I was wondering... What if need to apply a regex spanning more than one line ?
推荐答案
这里的答案:
有没有简单的方法
Here's the Answer:
There is no easy way
我发现了一个 StreamRegex级这可能是能够做到我所期待的。照片 从我能掌握的算法:
I found a StreamRegex-Class which could be able to do what I am looking for.
From what I could grasp of the algorithm:
- 开始在文件中有一个空的缓冲区开始
- 请(
- 添加文件的大块缓冲
- 如果在缓冲器中的匹配
- 标记匹配
- 在下降从而出现在比赛结束前从缓冲区 中的所有数据
- Start at the beginning of the file with an empty buffer
- do (
- add a chunk of the file to the buffer
- if there is a match in the buffer
- mark the match
- drop all data which appeared before the end of the match from the buffer
这样,它不是以所必要的加载完整的文件 - 或者至少是加载在内存中的全部文件的机会减少...
但是:最坏的情况是,有在整个文件不匹配 - 在这种情况下,完整的文件将被加载到内存That way it is not nessesary to load the full file -- or at least the chances of loading the full file in memory are reduced...
However: Worst case is that there is no match in the whole file - in this case the full file will be loaded into memory.这篇关于在整个文件多行正则表达式搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!