Scala 正则表达式多块捕获 [英] Scala Regex Multiple Block Capturing
问题描述
我正在尝试在 Scala 中使用正则表达式捕获多行字符串的一部分.输入格式为:
I'm trying to capture parts of a multi-lined string with a regex in Scala. The input is of the form:
val input = """some text
|begin {
| content to extract
| content to extract
|}
|some text
|begin {
| other content to extract
|}
|some text""".stripMargin
我已经尝试了几种可能让我摆脱 begin {
}
块的文本.其中之一:
I've tried several possibilities that should get me the text out of the begin {
}
blocks. One of them:
val Block = """(?s).*begin \{(.*)\}""".r
input match {
case Block(content) => println(content)
case _ => println("NO MATCH")
}
我得到一个NO MATCH
.如果我删除 \}
正则表达式看起来像 (?s).*begin \{(.*)
并且它匹配最后一个块,包括不需要的 }
和一些文本".我在 rubular.com 上检查了我的正则表达式 /.*begin \{(.*)\}/m
并且它至少匹配一个块.我想当我的 Scala 正则表达式匹配相同时,我可以开始使用 findAllIn
来匹配所有块.我做错了什么?
I get a NO MATCH
. If I drop the \}
the regex looks like (?s).*begin \{(.*)
and it matches the last block including the unwanted }
and "some text". I checked my regex at rubular.com as with /.*begin \{(.*)\}/m
and it matches at least one block. I thought when my Scala regex would match the same I could start using findAllIn
to match all blocks. What am I doing wrong?
我查看了 Scala Regex enable Multiline option 但我无法管理捕获所有出现的文本块,例如,Seq[String]
.任何帮助表示赞赏.
I had a look at Scala Regex enable Multiline option but I could not manage to capture all the occurrences of the text blocks in, for example, a Seq[String]
.
Any help is appreciated.
推荐答案
正如 Alex 所说,当使用模式匹配从正则表达式中提取字段,模式就像它是有界的(即,使用 ^
和 $
).避免此问题的常用方法是首先使用 findAllIn
.这样:
As Alex has said, when using pattern matching to extract fields from regular expressions, the pattern acts as if it was bounded (ie, using ^
and $
). The usual way to avoid this problem is to use findAllIn
first. This way:
val input = """some text
|begin {
| content to extract
| content to extract
|}
|some text
|begin {
| other content to extract
|}
|some text""".stripMargin
val Block = """(?s)begin \{(.*)\}""".r
Block findAllIn input foreach (_ match {
case Block(content) => println(content)
case _ => println("NO MATCH")
})
否则,您可以在开头和结尾使用 .*
来绕过该限制:
Otherwise, you can use .*
at the beginning and end to get around that restriction:
val Block = """(?s).*begin \{(.*)\}.*""".r
input match {
case Block(content) => println(content)
case _ => println("NO MATCH")
}
顺便说一下,您可能想要一个非热切的匹配器:
By the way, you probably want a non-eager matcher:
val Block = """(?s)begin \{(.*?)\}""".r
Block findAllIn input foreach (_ match {
case Block(content) => println(content)
case _ => println("NO MATCH")
})
这篇关于Scala 正则表达式多块捕获的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!