Scala解析器令牌定界符问题 [英] Scala Parser Token Delimiter Problem

查看:94
本文介绍了Scala解析器令牌定界符问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试为以下命令定义语法.

I'm trying to define a grammar for the commands below.

object ParserWorkshop {
    def main(args: Array[String]) = {
        ChoiceParser("todo link todo to database")
        ChoiceParser("todo link todo to database deadline: next tuesday context: app.model")
    }
}

第二个命令应标记为:

action = todo
message = link todo to database
properties = [deadline: next tuesday, context: app.model]

当我在下面定义的语法上运行此输入时,收到以下错误消息:

When I run this input on the grammar defined below, I receive the following error message:

[1.27] parsed: Command(todo,link todo to database,List())
[1.36] failure: string matching regex `\z' expected but `:' found

todo link todo to database deadline: next tuesday context: app.model
                                   ^

据我所知它失败了,因为用于匹配消息单词的模式几乎与属性key:value对的键的模式相同,因此解析器无法分辨消息的结尾和属性开始.我可以通过坚持将start令牌用于每个属性来解决此问题,如下所示:

As far as I can see it fails because the pattern for matching the words of the message is nearly identical to the pattern for the key of the property key:value pair, so the parser cannot tell where the message ends and the property starts. I can solve this by insisting that start token be used for each property like so:

todo link todo to database :deadline: next tuesday :context: app.model

但是我宁愿保持命令尽可能接近自然语言. 我有两个问题:

But i would prefer to keep the command as close natural language as possible. I have two questions:

错误消息的实际含义是什么? 以及如何修改现有语法以使其适用于给定的输入字符串?

What does the error message actually mean? And how would I modify the existing grammar to work for the given input strings?

import scala.util.parsing.combinator._

case class Command(action: String, message: String, properties: List[Property])
case class Property(name: String, value: String)

object ChoiceParser extends JavaTokenParsers {
    def apply(input: String) = println(parseAll(command, input))

    def command = action~message~properties ^^ {case a~m~p => new Command(a, m, p)}

    def action = ident

    def message = """[\w\d\s\.]+""".r

    def properties = rep(property)

    def property = propertyName~":"~propertyValue ^^ {
        case n~":"~v => new Property(n, v)
    }

    def propertyName: Parser[String] = ident

    def propertyValue: Parser[String] = """[\w\d\s\.]+""".r
}

推荐答案

这真的很简单.使用~时,您必须了解成功完成的单个解析器没有回溯.

It is really simple. When you use ~, you have to understand that there's no backtracking on individual parsers which have completed succesfully.

因此,例如,message使所有内容都在冒号之前,因为所有这些都是可接受的模式.接下来,propertiespropertyrep,需要propertyName,但是它仅找到冒号(第一个字符不被message吞噬).因此propertyName失败,而property失败.现在,如上所述,properties是一个rep,因此它成功完成了0次重复,这使得command成功完成了.

So, for instance, message got everything up to before the colon, as all of that is an acceptable pattern. Next, properties is a rep of property, which requires propertyName, but it only finds the colon (the first char not gobbled by message). So propertyName fails, and property fails. Now, properties, as mentioned, is a rep, so it finishes succesfully with 0 repetitions, which then makes command finish succesfully.

因此,返回到parseAll. command解析器成功返回,已经消耗了冒号之前的所有内容.然后它问一个问题:我们在输入(\z)的末尾吗?不,因为下一个冒号是.因此,它期望输入结束,但是冒号.

So, back to parseAll. The command parser returned succesfully, having consumed everything before the colon. It then asks the question: are we at the end of the input (\z)? No, because there is a colon right next. So, it expected end-of-input, but got a colon.

您必须更改正则表达式,以便它不会占用冒号前的最后一个标识符.例如:

You'll have to change the regex so it won't consume the last identifier before a colon. For example:

def message = """[\w\d\s\.]+(?![:\w])""".r

顺便说一句,当您使用def时,您将强制重新计算该表达式.换句话说,每次调用每个def时都会创建一个解析器.每次处理正则表达式所属的解析器时,都会实例化正则表达式.如果将所有内容更改为val,您将获得更好的性能.

By the way, when you use def you force the expression to be reevaluated. In other words, each of these defs create a parser every time each one is called. The regular expressions are instantiated every time the parsers they belong to are processed. If you change everything to val, you'll get much better performance.

请记住,这些东西定义解析器,它们不会运行.是运行解析器的parseAll.

Remember, these things define the parser, they do not run it. It is parseAll which runs a parser.

这篇关于Scala解析器令牌定界符问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆