Scala 正则表达式多块捕获 [英] Scala Regex Multiple Block Capturing

查看：28 发布时间：2021/7/6 19:47:55 regex scala

本文介绍了Scala 正则表达式多块捕获的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试在 Scala 中使用正则表达式捕获多行字符串的一部分.输入格式为:

I'm trying to capture parts of a multi-lined string with a regex in Scala. The input is of the form:

val input = """some text
              |begin {
              |  content to extract
              |  content to extract
              |}
              |some text
              |begin {
              |  other content to extract
              |}
              |some text""".stripMargin

我已经尝试了几种可能让我摆脱 begin { } 块的文本.其中之一:

I've tried several possibilities that should get me the text out of the begin { } blocks. One of them:

val Block = """(?s).*begin \{(.*)\}""".r

input match {
  case Block(content) => println(content)
  case _ => println("NO MATCH")
}

我得到一个NO MATCH.如果我删除 \} 正则表达式看起来像 (?s).*begin \{(.*) 并且它匹配最后一个块，包括不需要的 } 和一些文本".我在 rubular.com 上检查了我的正则表达式 /.*begin \{(.*)\}/m 并且它至少匹配一个块.我想当我的 Scala 正则表达式匹配相同时，我可以开始使用 findAllIn 来匹配所有块.我做错了什么?

I get a NO MATCH. If I drop the \} the regex looks like (?s).*begin \{(.*) and it matches the last block including the unwanted } and "some text". I checked my regex at rubular.com as with /.*begin \{(.*)\}/m and it matches at least one block. I thought when my Scala regex would match the same I could start using findAllIn to match all blocks. What am I doing wrong?

我查看了 Scala Regex enable Multiline option 但我无法管理捕获所有出现的文本块，例如，Seq[String].任何帮助表示赞赏.

I had a look at Scala Regex enable Multiline option but I could not manage to capture all the occurrences of the text blocks in, for example, a Seq[String]. Any help is appreciated.

推荐答案

正如 Alex 所说，当使用模式匹配从正则表达式中提取字段，模式就像它是有界的(即，使用 ^ 和 $).避免此问题的常用方法是首先使用 findAllIn.这样:

As Alex has said, when using pattern matching to extract fields from regular expressions, the pattern acts as if it was bounded (ie, using ^ and $). The usual way to avoid this problem is to use findAllIn first. This way:

val input = """some text
              |begin {
              |  content to extract
              |  content to extract
              |}
              |some text
              |begin {
              |  other content to extract
              |}
              |some text""".stripMargin

val Block = """(?s)begin \{(.*)\}""".r

Block findAllIn input foreach (_ match {
  case Block(content) => println(content)
  case _ => println("NO MATCH")
})

否则，您可以在开头和结尾使用 .* 来绕过该限制:

Otherwise, you can use .* at the beginning and end to get around that restriction:

val Block = """(?s).*begin \{(.*)\}.*""".r

input match {
  case Block(content) => println(content)
  case _ => println("NO MATCH")
}

顺便说一下，您可能想要一个非热切的匹配器:

By the way, you probably want a non-eager matcher:

val Block = """(?s)begin \{(.*?)\}""".r

Block findAllIn input foreach (_ match {
  case Block(content) => println(content)
  case _ => println("NO MATCH")
})

这篇关于Scala 正则表达式多块捕获的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Scala 正则表达式多块捕获 [英] Scala Regex Multiple Block Capturing

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Scala 正则表达式多块捕获 [英] Scala Regex Multiple Block Capturing

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭