Scala解析器组合器,用于嵌入在html或文本中的语言(例如php) [英] Scala parser combinators for language embedded in html or text (like php)

查看:120
本文介绍了Scala解析器组合器,用于嵌入在html或文本中的语言(例如php)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经和Scala解析器组合器玩了一段时间了,并学习了一些使用内置函数使它表现良好并完成我想要的大多数事情的方法.

I have been playing around with Scala parser combinators for some time now, and learned some of the ways to make it behave nicely and do the most of the things I want, using the built in function.

但是您如何制作嵌入式语言(例如php或ruby的erb)? 它要求在嵌入实际代码之外,不能忽略空格.

But how do you make an embedded language (like php or ruby's erb)? It requires whitespace to not be ignored, outside the embedding of real code.

我设法制作了一个简单的解析器,可以将所有文本匹配到给定的正则表达式匹配项,但是我正在寻找一种更好,更漂亮的方法.大概有一些已经定义好的功能可以完成所需的工作.

I managed to make a simple parser that matches all text up to a given regex match, but I am looking for a better, prettier way of doing this. There is propably some already defined function that does the stuff needed.

测试语言可解析以下文本:

The test language parses text like:

now: [[ millis; ]]
and now: [[; millis; ]]

,由以下代码生成:

package test

import scala.util.parsing.combinator.RegexParsers
import scala.util.matching.Regex

sealed abstract class Statement
case class Print(s: String) extends Statement
case class Millis() extends Statement

object SimpleLang extends RegexParsers {

  def until(r: Regex): Parser[String] = new Parser[String]{
    def apply(in: Input) = {
      val source = in.source
      val offset = in.offset
      val start = offset
      (r.findFirstMatchIn( source.subSequence(offset, source.length) )) match {
        case Some(matched) => 
          Success(source.subSequence(offset, offset + matched.start).toString, in.drop(matched.start))
        case None => 
          Failure("string matching regex `"+ r +"' expected but `"+ in.first +"' found", in.drop(0))
      }
    }
  }

  def until(s: String): Parser[String] = until(java.util.regex.Pattern.quote(s).r)

  def interpret(stats: List[Statement]): Unit = stats match {
    case Print(s) :: rest => {
      print(s)
      interpret(rest)
    }
    case Millis() :: rest => {
      print(System.currentTimeMillis)
      interpret(rest)
    }
    case Nil => ()
  }

  def apply(input: String) : List[Statement] = parseAll(beginning, input) match {
    case Success(tree,_) => tree
    case e: NoSuccess => throw new RuntimeException("Syntax error: " + e)
  }

  /** GRAMMAR **/

  def beginning = (
    "[[" ~> stats |
    until("[[") ~ "[[" ~ stats ^^ { 
      case s ~ _ ~ ss => Print(s) :: ss
    }
  )

  def stats = rep1sep(stat, ";")

  def stat = (
    "millis" ^^^ { Millis() } |
    "]]" ~> ( (until("[[") <~ "[[") | until("\\z".r)) ^^ {
      case s => Print(s)
    }
  )

  def main(args: Array[String]){
    val tree = SimpleLang("now: [[ millis; ]]\nand now: [[; millis; ]]")
    println(tree)
    interpret(tree)
  }

}

推荐答案

Scala的RegexParsers特性提供了从Regex到Parser [Char]的隐式转换,该转换会在检查正则表达式匹配之前跳过任何前导空格.您可以使用

Scala's RegexParsers trait provides an implicit conversion from Regex to Parser[Char] which skips any leading whitespace before checking for a regex match. You can use

override val skipWhitespace = false

关闭此行为,或覆盖whiteSpace成员(这是另一个正则表达式)以提供您自己的自定义字符串.

to turn this behavior off, or override the whiteSpace member (it's another regex) to provide your own custom string.

这些选项在全局范围内起作用,关闭空白跳过功能意味着所有正则表达式产品都将看到空白.

These options work globally, turning off the whitespace-skipping means that ALL regex productions will see the whitespace.

另一种选择是避免仅在需要空格的少数情况下使用正则表达式转换.我已经此处用于CSS的解析器中,该解析器在大多数地方会忽略注释,但在规则之前,它需要读取它们以提取一些Javadoc样式的元数据.

Another option would be to avoid using the regex conversion for just a few cases where you need whitespace. I've done so here in a parser for CSS which ignores comments in most places, but just before a rule it needs to read them to extract some javadoc-style metadata.

这篇关于Scala解析器组合器,用于嵌入在html或文本中的语言(例如php)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆