如何创建行尾很重要的解析器组合器? [英] How can I create a parser combinator in which line endings are significant?

查看:57
本文介绍了如何创建行尾很重要的解析器组合器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在创建一个DSL,并使用Scala的解析器组合器库来解析DSL. DSL遵循简单的类似Ruby的语法.源文件可以包含一系列如下所示的块:

create_model do
  at 0,0,0
end

在DSL中,行尾很重要,因为它们有效地用作语句终止符.

我写了一个Scala解析器,看起来像这样:

class ML3D extends JavaTokenParsers {
  override val whiteSpace = """[ \t]+""".r

  def model: Parser[Any] = commandList
  def commandList: Parser[Any] = rep(commandBlock)
  def commandBlock: Parser[Any] = command~"do"~eol~statementList~"end"
  def eol: Parser[Any] = """(\r?\n)+""".r
  def command: Parser[Any] = commandName~opt(commandLabel)
  def commandName: Parser[Any] = ident
  def commandLabel: Parser[Any] = stringLiteral
  def statementList: Parser[Any] = rep(statement)
  def statement: Parser[Any] = functionName~argumentList~eol
  def functionName: Parser[Any] = ident
  def argumentList: Parser[Any] = repsep(argument, ",")
  def argument: Parser[Any] = stringLiteral | constant
  def constant: Parser[Any] = wholeNumber | floatingPointNumber
}

由于行尾很重要,所以我覆盖了whiteSpace,因此它将仅将空格和制表符视为空格(而不是将新行视为空格,因此将其忽略).

这有效,除了commandBlock的"end"语句.由于我的源文件包含尾随的新行,因此解析器抱怨它只希望使用end,但是在end关键字之后得到了新行.

所以我将commandBlock的定义更改为:

def commandBlock: Parser[Any] = command~"do"~eol~statementList~"end"~opt(eol)

(也就是说,我在结束"之后添加了一个可选的新行).

但是现在,在解析源文件时,出现以下错误:

[4.1] failure: `end' expected but `' found

认为这是因为解析器在将其拖入尾随的新行之后,遇到了一个空字符串,认为它是无效的,但是我不确定为什么要这样做. /p>

有关如何解决此问题的任何提示?我可能会从Scala的解析器组合器库中扩展错误的解析器,因此,也欢迎提供有关如何使用有效的换行符创建语言定义的任何建议.

解决方案

我在两种方式上都遇到相同的错误,但是我认为您在误解它.它的意思是期望end,但是它已经到达输入的末尾.

发生的原因是end被作为语句读取.现在,我敢肯定有一个解决此问题的好方法,但是我对Scala解析器没有足够的经验.似乎要采取的方法是在扫描部分中使用令牌解析器,但我无法找到一种使标准令牌解析器不将换行符视为空白的方法.

所以,这是一个替代方法:

import scala.util.parsing.combinator.JavaTokenParsers

class ML3D extends JavaTokenParsers {
  override val whiteSpace = """[ \t]+""".r
  def keywords: Parser[Any] = "do" | "end"
  def identifier: Parser[Any] = not(keywords)~ident

  def model: Parser[Any] = commandList
  def commandList: Parser[Any] = rep(commandBlock)
  def commandBlock: Parser[Any] = command~"do"~eol~statementList~"end"~opt(eol)
  def eol: Parser[Any] = """(\r?\n)+""".r
  def command: Parser[Any] = commandName~opt(commandLabel)
  def commandName: Parser[Any] = identifier
  def commandLabel: Parser[Any] = stringLiteral
  def statementList: Parser[Any] = rep(statement)
  def statement: Parser[Any] = functionName~argumentList~eol
  def functionName: Parser[Any] = identifier
  def argumentList: Parser[Any] = repsep(argument, ",")
  def argument: Parser[Any] = stringLiteral | constant
  def constant: Parser[Any] = wholeNumber | floatingPointNumber
}

I am creating a DSL, and using Scala's parser combinator library to parse the DSL. The DSL follows a simple, Ruby-like syntax. A source file can contain a series of blocks that look like this:

create_model do
  at 0,0,0
end

Line endings are significant in the DSL, as they are effectively used as statement terminators.

I wrote a Scala parser that looks like this:

class ML3D extends JavaTokenParsers {
  override val whiteSpace = """[ \t]+""".r

  def model: Parser[Any] = commandList
  def commandList: Parser[Any] = rep(commandBlock)
  def commandBlock: Parser[Any] = command~"do"~eol~statementList~"end"
  def eol: Parser[Any] = """(\r?\n)+""".r
  def command: Parser[Any] = commandName~opt(commandLabel)
  def commandName: Parser[Any] = ident
  def commandLabel: Parser[Any] = stringLiteral
  def statementList: Parser[Any] = rep(statement)
  def statement: Parser[Any] = functionName~argumentList~eol
  def functionName: Parser[Any] = ident
  def argumentList: Parser[Any] = repsep(argument, ",")
  def argument: Parser[Any] = stringLiteral | constant
  def constant: Parser[Any] = wholeNumber | floatingPointNumber
}

Since line endings matter, I overrode whiteSpace so that it'll only treat spaces and tabs as whitespace (instead of treating new lines as whitespace, and thus ignoring them).

This works, except for the "end" statement for commandBlock. Since my source file contains a trailing new line, the parser complains that it was expecting just an end but got a new line after the end keyword.

So I changed commandBlock's definition to this:

def commandBlock: Parser[Any] = command~"do"~eol~statementList~"end"~opt(eol)

(That is, I added an optional new line after "end").

But now, when parsing the source file, I get the following error:

[4.1] failure: `end' expected but `' found

I think this is because, after it sucks it the trailing new line, the parser is encountering an empty string which it thinks is invalid, but I'm not sure why it's doing this.

Any tips on how to fix this? I might extending the wrong parser from Scala's parser combinator library, so any suggestions on how to create a language definition with significant new line characters is also welcome.

解决方案

I get the same error in both ways, but I think you are misinterpreting it. What it's saying is that it is expecting an end, but it already reached the end of the input.

And the reason that is happening is that end is being read as a statement. Now, I'm sure there's a nice way to solve this, but I'm not experienced enough with Scala parsers. It seems the way to go would be to use token parsers with a scanning part, but I couldn't figure a way to make the standard token parser not treat newlines as whitespace.

So, here's an alternative:

import scala.util.parsing.combinator.JavaTokenParsers

class ML3D extends JavaTokenParsers {
  override val whiteSpace = """[ \t]+""".r
  def keywords: Parser[Any] = "do" | "end"
  def identifier: Parser[Any] = not(keywords)~ident

  def model: Parser[Any] = commandList
  def commandList: Parser[Any] = rep(commandBlock)
  def commandBlock: Parser[Any] = command~"do"~eol~statementList~"end"~opt(eol)
  def eol: Parser[Any] = """(\r?\n)+""".r
  def command: Parser[Any] = commandName~opt(commandLabel)
  def commandName: Parser[Any] = identifier
  def commandLabel: Parser[Any] = stringLiteral
  def statementList: Parser[Any] = rep(statement)
  def statement: Parser[Any] = functionName~argumentList~eol
  def functionName: Parser[Any] = identifier
  def argumentList: Parser[Any] = repsep(argument, ",")
  def argument: Parser[Any] = stringLiteral | constant
  def constant: Parser[Any] = wholeNumber | floatingPointNumber
}

这篇关于如何创建行尾很重要的解析器组合器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆