Scala正则表达式多行匹配,负前瞻 [英] Scala regex multiline match with negative lookahead

查看:273
本文介绍了Scala正则表达式多行匹配,负前瞻的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Scala的解析器组合器编写DSL.我最近将我的基类从StandardTokenParsers更改为JavaTokenParsers,以利用我认为我需要解决的最后一个难题的正则表达式功能. (请参见使用scala StandardTokenParser解析带分隔符的多行字符串)

I'm writing a DSL using Scala's parser combinators. I have recently changed my base class from StandardTokenParsers to JavaTokenParsers to take advantage of the regex features I think I need for one last piece of the puzzle. (see Parsing a delimited multiline string using scala StandardTokenParser)

我想做的是提取由某些字符(在此示例中为{{}})分隔的文本块.此文本块可以跨越多行.到目前为止,我有:

What I am trying to do is to extract a block of text delimited by some characters ({{ and }} in this example). This block of text can span multiple lines. What I have so far is:

  def docBlockRE = regex("""(?s)(?!}}).*""".r)
  def docBlock: Parser[DocString] =
      "{{" ~> docBlockRE <~ "}}" ^^ { case str => new DocString(str) }}

其中DocString是我的DSL中的一个案例类.但是,这不起作用.如果输入以下内容,它将失败:

where DocString is a case class in my DSL. However, this doesn't work. It fails if I feed it the following:

{{
abc
}}

{{
abc
}}

我不确定为什么会失败.如果我在周围放置Deubg包装器,则在解析器周围使用调试包装器(

I'm not sure why this fails. If I put a Deubg wrapper around have a debug wrapper around the parser (http://jim-mcbeath.blogspot.com/2011/07/debugging-scala-parser-combinators.html) I get the following:

docBlock.apply for token 
   at position 10.2 offset 165 returns [19.1] failure: `}}' expected but end of source found

如果我尝试用多行分隔的单个块:

If I try a single delimited block with multiple lines:

{{
abc
def
}}

然后它也无法解析为:

docBlock.apply for token 
  at position 10.2 offset 165 returns [16.1] failure: `}}' expected but end of source found

如果我删除了DOTALL指令(?s),那么我可以解析多个单行块(这对我没有多大帮助).

If I remove the DOTALL directive (?s) then I can parse multiple single-line blocks (which doesn't really help me much).

有什么方法可以将多行正则表达式与负前瞻相结合?

Is there any way of combining multi-line regex with negative lookahead?

此方法的另一个问题是,无论我做什么,结束分隔符必须都应与文本分开放置.否则,我会收到与上面相同的错误消息.几乎就像负面预测一样,实际上并没有像我期望的那样起作用.

One other issue I have with this approach is that, no matter what I do, the closing delimiter must be on a separate line from the text. Otherwise I get the same kind of error message I see above. It is almost like the negative lookahead isn't really working as I expect it to.

推荐答案

在上下文中:

scala> val rr = """(?s).*?(?=}})""".r
rr: scala.util.matching.Regex = (?s).*?(?=}})

scala> object X extends JavaTokenParsers {val r: Parser[String] = rr; val b: Parser[String] = "{{" ~>r<~"}}" ^^ { case s => s } }
defined object X

scala> X.parseAll(X.b, """{{ abc
     | def
     | }}""")
res15: X.ParseResult[String] =
[3.3] parsed: abc
def

更多表现出贪婪的差异:

More to show difference in greed:

scala> val rr = """(?s)(.*?)(?=}})""".r.unanchored
rr: scala.util.matching.UnanchoredRegex = (?s)(.*?)(?=}})

scala> def f(s: String) = s match { case rr(x) => x case _ => "(none)" }
f: (s: String)String

scala> f("something }} }}")
res3: String = "something "

scala> val rr = """(?s)(.*)(?=}})""".r.unanchored
rr: scala.util.matching.UnanchoredRegex = (?s)(.*)(?=}})

scala> def f(s: String) = s match { case rr(x) => x case _ => "(none)" }
f: (s: String)String

scala> f("something }} }}")
res4: String = "something }} "

前瞻仅表示确保它跟随我,但不要消耗它."

The lookahead just means "make sure this follows me, but don't consume it."

负向超前只是意味着确保它不跟在我后面.

Negative lookahead just means make sure it doesn't follow me.

这篇关于Scala正则表达式多行匹配,负前瞻的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆