通过Scala中的解析器处理额外的状态 [英] Threading extra state through a parser in Scala

查看:139
本文介绍了通过Scala中的解析器处理额外的状态的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我会给你tl; dr在前面



我试图在 Scalaz 7 通过解析器对额外状态进行线程处理,而且如果不编写 tma - > t m b 版本 m a - > mb 方法。

解析问题的示例



假设我有一个包含字符串嵌套圆括号内的数字:

  val input =((617)((0)(32)))

我也有一些新的变量名称(在这种情况下是字符):

  val names = Stream('a'to'z':_ *)

我想从流的顶部拉一个名称,并在解析它时将它分配给每个括号内的
表达式,然后将该名称映射到一个字符串表示括号中
的内容,嵌套的括号表达式(如果有的话)被它们的
名称取代。为了使这更具体,这里是我希望输出看起来像上面的示例输入:

  val target = Map(
'a' - >617,
'b' - >0,
'c' - >32,
'd' - >bc ,
e' - >ad

可能有一串数字或给定级别的任意多个子表达式,但这两种内容不会混合在一个括号内。为了简单起见,我们假设名称流永远不会包含重复或数字,并且它总是包含足够的

$ b

使用带有一些可变状态的解析器组合符



上面的例子是在
这个堆栈溢出问题中的解析问题的一个稍微简化的版本。

a解决方案回答了这个问题
,看起来大概是这样的:

  import scala.util.parsing.combinator._ 
$ b $ class ParenParser(names:Iterator [Char])extends RegexParsers {
def paren:Parser [List [(Char,String)]] =(〜> contents<〜)^^ {
case(s,m)=> (names.next - > s):: m
}

def contents:Parser [(String,List [(Char,String)])] =
\\ \\ _ \\ d +。r ^^(_ - > Nil)| rep1(paren)^^(
ps => ps.map(_。head._1).mkString - > ps.flatten


def parse(s :String)= parseAll(paren,s).map(_ .toMap)
}

这不是太糟糕,但我宁愿避免可变状态。

我想要什么



Haskell's Parsec 库使
为解析器添加用户状态非常简单:



<$ p $ (<*)),$($>),(< *))
import Data.Map(fromList)
import Text .Parsec
$ b $ paren = do
(s,m)< - char'('*> contents< * char')'
h:t< - getState
putState t
return $(h,s):m
其中
内容
=翻转(,)[]
<$> many1 digit
< |> (\ps->(map(fst.head)ps,concat ps))
< $> many1 paren

main = print $
runParser(fromList< $> paren)['a'..'z']example((617)((0) (32)))

这是我上面的Scala解析器的一个相当直接的翻译,但没有可变状态。



我试过的是什么



我试图尽可能接近Parsec解决方案我可以使用Scalaz的state monad变换器,所以不是 Parser [A] 我正在使用 StateT [Parser,Stream [Char],A]
我有一个解决方案,它允许我编写以下代码:

  import scala.util.parsing.combinator ._ 
import scalaz._,Scalaz._

object ParenParser使用RegexParsers扩展ExtraStateParsers [Stream [Char]] {
protected implicit def monadInstance = parserMonad(this)
$ b $ def paren:ESP [List [(Char,String)]] =
(lift(()〜> contents<〜lift()))。flatMap {
case(s,m)=> get.flatMap(
names => put(names.tail).map(_ =>(names.head - > s):: m)

}

def contents:ESP [(String,List [(Char,String)])] =
lift(\\d +。r ^^(_ - > Nil)) | rep1(paren).map(
ps => ps.map(_。head._1).mkString - > ps.flatten


def parse(s :String,names:Stream [Char])=
parseAll(paren.eval(names),s).map(_ .toMap)
}

这可以起作用,并不比可变状态版本或Parsec版本简洁得多。



但是我的 ExtraStateParsers 是丑陋的罪 - 我不想尝试你的耐心,因为我已经有了,所以我不会在这里包含它(尽管< a href =https://gist.github.com/3747234 =nofollow noreferrer>如果你真的想要的话,这里是一个链接)。我必须编写每个 Parser Parsers 方法的新版本,我在
之上使用我的 ExtraStateParsers ESP 类型( rep1 〜> <〜 | ,以防万一数数)。如果我需要使用其他组合器,我不得不编写新的状态变换器级别的版本。



有没有更简单的方法可以做到这一点? ?我很希望看到一个Scalaz 7状态monad变换器被用来通过解析器来处理状态的例子,但是Scalaz 6或Haskell的例子也是有用的和值得赞赏的。 解决方案可能最通用的解决方案是重写Scala的解析器库,以便在解析时适应一次性计算(就像你部分做的那样),但这将是一项非常艰巨的任务。



我建议使用 ScalaZ State 我们的每个结果不是 Parse [X] 类型的值,而是类型 Parse [State [Stream [Char],X]]的值。 code>(别名为 ParserS [X] )。因此,整体解析结果不是一个值,而是一个一元状态值,然后在某些 Stream [Char] 上运行。这几乎是一个单体变压器,但我们必须手动进行提升/解除。它使代码有点丑陋,因为我们有时需要提升值,或者在几个地方使用 map / flatMap 但我相信这仍然是合理的。

  import scala.util.parsing.combinator._ 
import scalaz._
import Scalaz._
import Traverse._

object ParenParser将RegexParsers扩展为状态{
type S [X] = State [Stream [Char],X];
类型解析器[X] =解析器[S [X]];


// Haskell对状态返回`
def toState [S,X](x:X):State [S,X] = gets(_ => x)

// Haskell的mapM`为状态
def mapM [S,X](l:List [State [S,X]]):State [S,List [X ]] =
l.traverse [({type L [Y] = State [S,Y]})#L,X](identity _);

// ........................................ .........

//从状态
//内的流中读取下一个字符,并将状态更新为流的尾部。
def next:S [Char] = state(s =>(s.tail,s.head));

$ b def paren:ParserS [List [(Char,String)]] =
(〜> contents<〜)^^(_ flatMap {
case(s,m)=> next map(v =>(v - > s):: m)
})


def内容:ParserS [(String,List [(Char,String)])] = digits |括号;
def digits:ParserS [(String,List [(Char,String)])] =
\\d +。^^(toState _ )
def parens:ParserS [(String,List [(Char,String)])] =
rep1(paren)^^(mapM _)^^(_.map(
ps => ps.map(_。head._1).mkString - > ps.flatten
))


def parse(s:String):ParseResult [S [Map [Char,String]]] =
parseAll(paren,s).map(_。map(_。toMap))
$ b $ def解析(s:String,names:Stream [Char]):ParseResult [Map [Char,String]] =
parse(s).map(_!names);
}

object ParenParserTest extends App {
{
println(ParenParser.parse(((617)((0)(32))),Stream ('a'到'z':_ *)));


$ / code $ / pre

$ hr

注意:我相信你用 StateT [Parser,Stream [Char],_] 的方法在概念上是不正确的。该类型表示我们正在构建给定状态的分析器(名称流)。因此,给予不同的流可能会得到不同的解析器。这不是我们想要做的。我们只希望解析的结果取决于名称,而不是整个解析器。通过这种方式,似乎更合适(Haskell的Parsec采用类似的方法,state / monad在解析器中) 。


I'll give you the tl;dr up front

I'm trying to use the state monad transformer in Scalaz 7 to thread extra state through a parser, and I'm having trouble doing anything useful without writing a lot of t m a -> t m b versions of m a -> m b methods.

An example parsing problem

Suppose I have a string containing nested parentheses with digits inside them:

val input = "((617)((0)(32)))"

I also have a stream of fresh variable names (characters, in this case):

val names = Stream('a' to 'z': _*)

I want to pull a name off the top of the stream and assign it to each parenthetical expression as I parse it, and then map that name to a string representing the contents of the parentheses, with the nested parenthetical expressions (if any) replaced by their names.

To make this more concrete, here's what I'd want the output to look like for the example input above:

val target = Map(
  'a' -> "617",
  'b' -> "0",
  'c' -> "32",
  'd' -> "bc",
  'e' -> "ad"
)

There may be either a string of digits or arbitrarily many sub-expressions at a given level, but these two kinds of content won't be mixed in a single parenthetical expression.

To keep things simple, we'll assume that the stream of names will never contain either duplicates or digits, and that it will always contain enough names for our input.

Using parser combinators with a bit of mutable state

The example above is a slightly simplified version of the parsing problem in this Stack Overflow question. I answered that question with a solution that looked roughly like this:

import scala.util.parsing.combinator._

class ParenParser(names: Iterator[Char]) extends RegexParsers {
  def paren: Parser[List[(Char, String)]] = "(" ~> contents <~ ")" ^^ {
    case (s, m) => (names.next -> s) :: m
  }

  def contents: Parser[(String, List[(Char, String)])] = 
    "\\d+".r ^^ (_ -> Nil) | rep1(paren) ^^ (
      ps => ps.map(_.head._1).mkString -> ps.flatten
    )

  def parse(s: String) = parseAll(paren, s).map(_.toMap)
}

It's not too bad, but I'd prefer to avoid the mutable state.

What I want

Haskell's Parsec library makes adding user state to a parser trivially easy:

import Control.Applicative ((*>), (<$>), (<*))
import Data.Map (fromList)
import Text.Parsec

paren = do
  (s, m) <- char '(' *> contents <* char ')'
  h : t  <- getState
  putState t
  return $ (h, s) : m
  where
    contents
      =  flip (,) []
     <$> many1 digit
     <|> (\ps -> (map (fst . head) ps, concat ps))
     <$> many1 paren

main = print $
  runParser (fromList <$> paren) ['a'..'z'] "example" "((617)((0)(32)))"

This is a fairly straightforward translation of my Scala parser above, but without mutable state.

What I've tried

I'm trying to get as close to the Parsec solution as I can using Scalaz's state monad transformer, so instead of Parser[A] I'm working with StateT[Parser, Stream[Char], A]. I have a "solution" that allows me to write the following:

import scala.util.parsing.combinator._
import scalaz._, Scalaz._

object ParenParser extends ExtraStateParsers[Stream[Char]] with RegexParsers {
  protected implicit def monadInstance = parserMonad(this)

  def paren: ESP[List[(Char, String)]] = 
    (lift("(" ) ~> contents <~ lift(")")).flatMap {
      case (s, m) => get.flatMap(
        names => put(names.tail).map(_ => (names.head -> s) :: m)
      )
    }

  def contents: ESP[(String, List[(Char, String)])] =
    lift("\\d+".r ^^ (_ -> Nil)) | rep1(paren).map(
      ps => ps.map(_.head._1).mkString -> ps.flatten
    )

  def parse(s: String, names: Stream[Char]) =
    parseAll(paren.eval(names), s).map(_.toMap)
}

This works, and it's not that much less concise than either the mutable state version or the Parsec version.

But my ExtraStateParsers is ugly as sin—I don't want to try your patience more than I already have, so I won't include it here (although here's a link, if you really want it). I've had to write new versions of every Parser and Parsers method I use above for my ExtraStateParsers and ESP types (rep1, ~>, <~, and |, in case you're counting). If I had needed to use other combinators, I'd have had to write new state transformer-level versions of them as well.

Is there a cleaner way to do this? I'd love to see an example of a Scalaz 7's state monad transformer being used to thread state through a parser, but Scalaz 6 or Haskell examples would also be useful and appreciated.

解决方案

Probably the most general solution would be to rewrite Scala's parser library to accommodate monadic computations while parsing (like you partly did), but that would be quite a laborious task.

I suggest a solution using ScalaZ's State where each of our result isn't a value of type Parse[X], but a value of type Parse[State[Stream[Char],X]] (aliased as ParserS[X]). So the overall parsed result isn't a value, but a monadic state value, which is then run on some Stream[Char]. This is almost a monad transformer, but we have to do lifting/unlifting manually. It makes the code a bit uglier, as we need to lift values sometimes or use map/flatMap on several places, but I believe it's still reasonable.

import scala.util.parsing.combinator._
import scalaz._
import Scalaz._
import Traverse._

object ParenParser extends RegexParsers with States {
  type S[X] = State[Stream[Char],X];
  type ParserS[X] = Parser[S[X]];


  // Haskell's `return` for States
  def toState[S,X](x: X): State[S,X] = gets(_ => x)

  // Haskell's `mapM` for State
  def mapM[S,X](l: List[State[S,X]]): State[S,List[X]] =
    l.traverse[({type L[Y] = State[S,Y]})#L,X](identity _);

  // .................................................

  // Read the next character from the stream inside the state
  // and update the state to the stream's tail.
  def next: S[Char] = state(s => (s.tail, s.head));


  def paren: ParserS[List[(Char, String)]] =
    "(" ~> contents <~ ")" ^^ (_ flatMap {
      case (s, m) => next map (v => (v -> s) :: m)
    })


  def contents: ParserS[(String, List[(Char, String)])] = digits | parens;
  def digits: ParserS[(String, List[(Char, String)])] =
    "\\d+".r ^^ (_ -> Nil) ^^ (toState _)
  def parens: ParserS[(String, List[(Char, String)])] =
    rep1(paren) ^^ (mapM _) ^^ (_.map(
        ps => ps.map(_.head._1).mkString -> ps.flatten
      ))


  def parse(s: String): ParseResult[S[Map[Char,String]]] =
    parseAll(paren, s).map(_.map(_.toMap))

  def parse(s: String, names: Stream[Char]): ParseResult[Map[Char,String]] =
    parse(s).map(_ ! names);
}

object ParenParserTest extends App {
  {
    println(ParenParser.parse("((617)((0)(32)))", Stream('a' to 'z': _*)));
  }
}


Note: I believe that your approach with StateT[Parser, Stream[Char], _] isn't conceptually correct. The type says that we're constructing a parser given some state (a stream of names). So it would be possible that given different streams we get different parsers. This is not what we want to do. We only want that the result of parsing depends on the names, not the whole parser. In this way Parser[State[Stream[Char],_]] seems to be more appropriate (Haskell's Parsec takes a similar approach, the state/monad is inside the parser).

这篇关于通过Scala中的解析器处理额外的状态的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆