Scala RegexParsers中的非贪心匹配 [英] non-greedy matching in Scala RegexParsers

查看:136
本文介绍了Scala RegexParsers中的非贪心匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我正在Scala中编写一个基本的SQL解析器.我有以下内容:

Suppose I'm writing a rudimentary SQL parser in Scala. I have the following:

class Arith extends RegexParsers {
    def selectstatement: Parser[Any] = selectclause ~ fromclause
    def selectclause: Parser[Any] = "(?i)SELECT".r ~ tokens
    def fromclause: Parser[Any] = "(?i)FROM".r ~ tokens
    def tokens: Parser[Any] = rep(token) //how to make this non-greedy?
    def token: Parser[Any] = "(\\s*)\\w+(\\s*)".r
}

当尝试将selectstatement与SELECT foo FROM bar匹配时,如何防止由于~ tokens中的rep(token)而使selectclause吞噬整个短语?

When trying to match selectstatement against SELECT foo FROM bar, how do I prevent the selectclause from gobbling up the entire phrase due to the rep(token) in ~ tokens?

换句话说,如何在Scala中指定非贪婪匹配?

In other words, how do I specify non-greedy matching in Scala?

为澄清起见,我完全意识到我可以在String模式本身内使用标准的非贪婪语法(*?)或(+?),但是我想知道是否有一种方法可以在内部更高级def令牌.例如,如果我定义了这样的令牌:

To clarify, I'm fully aware that I can use standard non-greedy syntax (*?) or (+?) within the String pattern itself, but I wondered if there's a way to specify it at a higher level inside def tokens. For example, if I had defined token like this:

def token: Parser[Any] = stringliteral | numericliteral | columnname

然后我如何为def令牌中的rep(token)指定非贪婪匹配?

Then how can I specify non-greedy matching for the rep(token) inside def tokens?

推荐答案

不容易,因为不会重试成功的匹配.考虑例如:

Not easily, because a successful match is not retried. Consider, for example:

object X extends RegexParsers {
  def p = ("a" | "aa" | "aaa" | "aaaa") ~ "ab"
}

scala> X.parseAll(X.p, "aaaab")
res1: X.ParseResult[X.~[String,String]] = 
[1.2] failure: `ab' expected but `a' found

aaaab
 ^

在括号内的解析器中,第一个匹配成功,因此继续进行下一个匹配.那一个失败了,所以p失败了.如果p是替代匹配项的一部分,则将尝试替代项,因此诀窍是产生可以处理这种事情的东西.

The first match was successful, in parser inside parenthesis, so it proceeded to the next one. That one failed, so p failed. If p was part of alternative matches, the alternative would be tried, so the trick is to produce something that can handle that sort of thing.

假设我们有这个:

def nonGreedy[T](rep: => Parser[T], terminal: => Parser[T]) = Parser { in =>
  def recurse(in: Input, elems: List[T]): ParseResult[List[T] ~ T] =
    terminal(in) match {
      case Success(x, rest) => Success(new ~(elems.reverse, x), rest)
      case _ => 
        rep(in) match {
          case Success(x, rest) => recurse(rest, x :: elems)
          case ns: NoSuccess    => ns
        }
    }

  recurse(in, Nil)
}  

然后您可以像这样使用它:

You can then use it like this:

def p = nonGreedy("a", "ab")

顺便说一句,我总是发现查看其他内容的定义有助于尝试提出上述nonGreedy之类的内容.特别是,请查看rep1的定义方式,以及如何更改它以避免重新评估其重复参数-在nonGreedy上,同样的事情可能会有用.

By the way,I always found that looking at how other things are defined is helpful in trying to come up with stuff like nonGreedy above. In particular, look at how rep1 is defined, and how it was changed to avoid re-evaluating its repetition parameter -- the same thing would probably be useful on nonGreedy.

这是一个完整的解决方案,只需进行一些更改即可避免使用终端".

Here's a full solution, with a little change to avoid consuming the "terminal".

trait NonGreedy extends Parsers {
    def nonGreedy[T, U](rep: => Parser[T], terminal: => Parser[U]) = Parser { in =>
      def recurse(in: Input, elems: List[T]): ParseResult[List[T]] =
        terminal(in) match {
          case _: Success[_] => Success(elems.reverse, in)
          case _ => 
            rep(in) match {
              case Success(x, rest) => recurse(rest, x :: elems)
              case ns: NoSuccess    => ns
            }
        }

      recurse(in, Nil)
    }  
}

class Arith extends RegexParsers with NonGreedy {
    // Just to avoid recompiling the pattern each time
    val select: Parser[String] = "(?i)SELECT".r
    val from: Parser[String] = "(?i)FROM".r
    val token: Parser[String] = "(\\s*)\\w+(\\s*)".r
    val eof: Parser[String] = """\z""".r

    def selectstatement: Parser[Any] = selectclause(from) ~ fromclause(eof)
    def selectclause(terminal: Parser[Any]): Parser[Any] = 
      select ~ tokens(terminal)
    def fromclause(terminal: Parser[Any]): Parser[Any] = 
      from ~ tokens(terminal)
    def tokens(terminal: Parser[Any]): Parser[Any] = 
      nonGreedy(token, terminal)
}

这篇关于Scala RegexParsers中的非贪心匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆