在整个文件多行正则表达式搜索 [英] Multi-line regex search in whole file

查看:134
本文介绍了在整个文件多行正则表达式搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我发现的例子负载上使用正则表达式的文件替换文本。然而这一切都归结为两个版本:
1.迭代文件中的所有线路和应用正则表达式的每个单线
2.将整个文件。

I've found loads of examples on to to replace text in files using regex. However it all boils down to two versions:
1. Iterate over all lines in the file and apply regex to each single line
2. Load the whole file.

没有。 2是不可行使用我的文件 - 他们即将2GiB ...
至于1号:目前,这是我的做法,但我不知道......如果需要申请一个正则表达式跨越多行?

No. 2 Is not feasible using "my" files - they're about 2GiB...
As to No. 1: Currently this is my approach, however I was wondering... What if need to apply a regex spanning more than one line ?

推荐答案

这里的答案:
有没有简单的方法

Here's the Answer:
There is no easy way

我发现了一个 StreamRegex级这可能是能够做到我所期待的。照片 从我能掌握的算法:

I found a StreamRegex-Class which could be able to do what I am looking for.
From what I could grasp of the algorithm:

  • 开始在文件中有一个空的缓冲区开始
  • 请(
    • 添加文件的大块缓冲
    • 如果在缓冲器中的匹配
      • 标记匹配
      • 在下降从而出现在比赛结束前从缓冲区
      • 中的所有数据
      • Start at the beginning of the file with an empty buffer
      • do (
        • add a chunk of the file to the buffer
        • if there is a match in the buffer
          • mark the match
          • drop all data which appeared before the end of the match from the buffer

          这样,它不是以所必要的加载完整的文件 - 或者至少是加载在内存中的全部文件的机会减少...
          但是:最坏的情况是,有在整个文件不匹配 - 在这种情况下,完整的文件将被加载到内存

          That way it is not nessesary to load the full file -- or at least the chances of loading the full file in memory are reduced...
          However: Worst case is that there is no match in the whole file - in this case the full file will be loaded into memory.

          这篇关于在整个文件多行正则表达式搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆