Java正则表达式匹配以固定标签开头的多行记录 [英] Java regex to match multiline records starting with fixed label

查看:192
本文介绍了Java正则表达式匹配以固定标签开头的多行记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下是多行记录列表的示例,每条记录都以固定字符串标签(LABEL)开头:

Following is an example of a list of multiline records, each starting with a fixed string label (LABEL):

<Irrelevant line>
...
<Irrelevant line>
LABEL ...
...
...
LABEL ...
...
...
LABEL ...
...
...
LABEL ...
...
...

是否有一个Java正则表达式可以做到以上内容并提取每条记录,即

Is there a Java regular expression that can much the above and extract each record, i.e.

LABEL ...
...
...

此外,这是提取这些记录的最快方法,还是逐行读取并检查字符串的开头会产生更快的结果?

Also, is this the fastest way of extracting those records, or reading line-by-line and checking the start of the string would yield faster results?

推荐答案

要遍历所有 LABEL 组,请使用:

To iterate over all the LABEL groups, use this:

Pattern regex = Pattern.compile("(?sm)LABEL.*?(?=^LABEL|\\Z)");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
    // the current LABEL group: regexMatcher.group()
} 

请参阅演示了解各种匹配.

说明

  • (?s) 激活 DOTALL 模式,允许点跨行匹配
  • (?m) 开启多行模式,允许 ^$ 在每一行匹配
  • LABEL 匹配文字字符
  • .*? 延迟匹配所有字符直到...
  • 前瞻(?=^LABEL|\\Z) 可以断言接下来是下一个LABEL 或字符串结尾的点
  • (?s) activates DOTALL mode, allowing the dot to match across lines
  • (?m) turns on multi-line mode, allowing ^ and $ to match on each line
  • LABEL matches literal characters
  • .*? lazily matches all chars up to...
  • the point where the lookahead (?=^LABEL|\\Z) can assert that what follows is the next LABEL or the end of the string

这篇关于Java正则表达式匹配以固定标签开头的多行记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆