如何跳过空格但将其用作解析器组合器中的标记分隔符 [英] How to skip whitespace but use it as a token delimeter in a parser combinator

查看:49
本文介绍了如何跳过空格但将其用作解析器组合器中的标记分隔符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试构建一个小型解析器,其中令牌(幸运的是)从不包含空格.空格(空格、制表符和换行符)本质上是标记分隔符(除了有括号等的情况).

I am trying to build a small parser where the tokens (luckily) never contain whitespace. Whitespace (spaces, tabs and newlines) are essentially token delimeters (apart from cases where there are brackets etc.).

我正在扩展 RegexParsers 类.如果我打开 skipWhitespace,当下一个标记与前一个的正则表达式匹配时,解析器会贪婪地将标记连接在一起.另一方面,如果我关闭 skipWhitespace,它会抱怨,因为空格不是定义的一部分.我试图尽可能地匹配 BNF,并且考虑到空格几乎总是分隔符(除了括号或其他一些在 BNF 中明确定义分隔符的情况),是否可以避免在所有内容中放置空格正则表达式我的定义?

I am extending the RegexParsers class. If I turn on skipWhitespace the parser is greedily joining tokens together when the next token matches the regular expression of the previous one. If I turn off skipWhitespace, on the other hand, it complains because of the spaces not being part of the definition. I am trying to match the BNF as much as possible, and given that whitespace is almost always the delimeter (apart from brackets or some other cases where the delimeter is explicitly defined in the BNF), is there away to avoid putting whitespace regex in all my definitions?

更新

这是一个将令牌连接在一起的小测试示例:

This is a small test example where the tokens are being joined together:

import scala.util.parsing.combinator.RegexParsers

object TestParser extends RegexParsers {
  def test  = "(test" ~> name <~ ")"

  def name : Parser[String] = (letter ~ (anyChar*)) ^^ { case first ~ rest => (first :: rest).mkString}

  def anyChar = letter | digit | "_".r | "-".r
  def letter = """[a-zA-Z]""".r
  def digit = """\d""".r

  def main(args: Array[String]) {

    val s = "(test hello these should not be joined and I should get an error)"

    val res = parseAll(test, s)
    res match {
      case Success(r, n) => println(r)
      case Failure(msg, n) => println(msg)
      case Error(msg, n) => println(msg)
    }

  }

}

在上述情况下,我只是将字符串连接在一起.类似的效果是,如果我将 test 更改为以下内容,期望它在测试后给我单独单词的列表,但它会将它们连接在一起,只给我一个带有长字符串的单元素列表, 中间没有空格:

In the above case I just get the string joined together. A similar effect is if I change test to the following, expecting it to give me the list of separate words after test, but instead it joins them together and just gives me a one element list with a long string, without the middle spaces:

def test  = "(test" ~> (name+) <~ ")"

推荐答案

在每个产生式规则之前跳过空格.所以,在这个片段中:

White space is skipped just before every production rule. So, in this snippet:

def name : Parser[String] = (letter ~ (anyChar*)) ^^ { case first ~ rest => (first :: rest).mkString}

它会跳过每个字母前的空格,更糟糕的是,每个 字符串都是很好的衡量标准(因为 anyChar* 可以是空的).

It will skip whitespace before each letter and, even worse, each empty string for good measure (since anyChar* can be empty).

对每个标记使用正则表达式(或纯字符串),而不是每个词法元素.像这样:

Use regular expressions (or plain strings) for each token, not each lexical element. Like this:

object TestParser extends RegexParsers {
  def test  = "(test" ~> name <~ ")"
  def name : Parser[String] = """[a-zA-Z][a-zA-Z0-9_-]*""".r

  // ...

这篇关于如何跳过空格但将其用作解析器组合器中的标记分隔符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆