在流上执行正则表达式 [英] Performing regex on a stream

查看:130
本文介绍了在流上执行正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些大型文本文件,我将进行连续匹配(只是捕获,而不是替换)。我认为将整个文件保存在内存中并不是一个好主意,而是使用 Reader

I have some large text files which im going to preform consecutive matching on (just capturing, not replacing). Im thinking its not such a good idea to keep the whole file in memory, but rather use a Reader.

我对输入的了解是,如果匹配,它不会超过5行。所以我的想法是有一些缓冲区只保留这5行,或者左右,进行第一次搜索,然后继续。但它必须知道正则表达式匹配结束的位置才能实现。例如,如果匹配在第2行结束,它应该从这里开始下一次搜索。是否有可能以高效方式执行此类操作?

What i know about the input is that if there's a match, its not going to span more than 5 lines. So my idea was to have some sort of buffer which just keeps these 5 lines, or so, do the first search, and continue. But it has to "know" where the regex match ended for this to work. e.g if the match ends at line 2 it should start the next search from here. Is it possible to do something like this in an efficient way?

推荐答案

您可以使用扫描仪 findWithinHorizo​​n 方法:

You could use a Scanner and the findWithinHorizon method:

Scanner s = new Scanner(new File("thefile"));
String nextMatch = s.findWithinHorizon(yourPattern, 0);

来自api findWithinHorizo​​n

From the api on findWithinHorizon:


如果horizo​​n为0,则忽略地平线并且此方法继续搜索输入以查找指定的模式无约束力。在这种情况下,它可以缓冲搜索模式的所有输入。

If horizon is 0, then the horizon is ignored and this method continues to search through the input looking for the specified pattern without bound. In this case it may buffer all of the input searching for the pattern.

附注:在多行匹配时,您可能需要查看常量 Pattern.MULTILINE Pattern.DOTALL

A side note: When matching on multiple lines, you might want to look at the constants Pattern.MULTILINE and Pattern.DOTALL.

这篇关于在流上执行正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆