将组合解析器的列表/序列转换为一个 [英] Turning a list/sequence of combinator parsers into a single one

查看:35
本文介绍了将组合解析器的列表/序列转换为一个的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个值列表,我可以从中构建一个解析器列表,这些解析器通过映射依赖于这些值(参见示例).然后我想要做的是通过串联将解析器列表变成单个解析器.

I have a list of values from which I can construct a list of parsers, that depend on these values by mapping (see example). Then what I want to do is turn the list of parsers into a single parser by concatenation.

一种可能性是使用 foldLeft~:

One possibility is using foldLeft and ~:

parsers.foldLeft(success(Nil)){case (ps,p) => rs ~ p ^^ {case xs ~ x => x ::xs}} ^^ (_.reverse)

这样有效吗?

我不知道组合解析器是如何工作的;会有一个具有列表长度深度的调用堆栈吗?因此,对于很长的连接,我可能会遇到 SO 错误吗?

Is this efficient?

I don't know how combinator parsers work; will there be a call stack with depth of length of the list? Thus may I run into SO errors for very long concatenations?

有没有更易读的不同方式?

Is there a different way that is more readable?

假设您有一个包含两行的文件.第一行包含 n 个整数 x_1 到 x_n.第二行包含 x_1 + x_2 + ... x_n 根据第一行属于组的整数.我想从第一行取整数序列并创建 n 个解析器 p_1 到 p_n,其中 p_i 解析 x_i 整数.

Suppose you have a file with two lines. The first line contains n integers x_1 to x_n. The second line contains contains x_1 + x_2 + ... x_n integers that belong to groups according to the first line. I want to take the sequence of integers from the first line and create n parsers p_1 to p_n where p_i parses x_i integers.

假设我有第一行的整数列表 l = List(1,2,3).对于每个整数 n,我创建了一个解析 n 整数的解析器:parsers = l.map(repN(_,integer)).>

Suppose I have the list of integers l = List(1,2,3) from the first line. For each integer n I create a parser that parses n integers: parsers = l.map(repN(_,integer)).

推荐答案

您所描述的(以及您在实施中使用 foldLeft~) 本质上是 Haskell 的 sequence 用于单子(实际上你只需要一个应用函子,但这在这里无关紧要).sequence 接受一元值列表并返回一元值列表.Parser 是一个 monad,所以 Parsersequence 会将 List[Parser[A]] 变成一个 <代码>解析器[列表[A]].

What you're describing (and what you've more or less reinvented in your implementation with foldLeft and ~) is essentially Haskell's sequence for monads (really you only need an applicative functor, but that's irrelevant here). sequence takes a list of monadic values and returns a monadic list of values. Parser is a monad, so sequence for Parser would change a List[Parser[A]] into a Parser[List[A]].

Scalaz 为您提供 sequence,但不是最重要的我不知道是否有一种很好的方法可以为 Parser 获取必要的 Applicative 实例.幸运的是你可以很容易地推出你自己的(我直接翻译 Haskell 定义):

Scalaz gives you sequence, but off the top of my head I don't know if there's a nice way to get the necessary Applicative instance for Parser. Fortunately you can roll your own pretty easily (I'm directly translating the Haskell definition):

import scala.util.parsing.combinator._

object parser extends RegexParsers {
  val integer = """\d+""".r

  val counts = List(1, 2, 3)
  val parsers = counts.map(repN(_, integer))

  val line = parsers.foldRight(success(Nil: List[List[String]])) {
    (m, n) => for { x <- m ; xs <- n } yield (x :: xs)
  }

  def apply(s: String) = parseAll(line, s)
}

这给了我们 List(List(1), List(2, 3), List(4, 5, 6)) for parser("1 2 3 4 5 6"),根据需要.

This gives us List(List(1), List(2, 3), List(4, 5, 6)) for parser("1 2 3 4 5 6"), as desired.

(请注意,我在这里使用 RegexParsers 作为一个方便的完整示例,但该方法更通用.)

(Note that I'm using RegexParsers here as a convenient complete example, but the approach works more generally.)

如果我们对 for 的理解进行脱糖处理可能会更清楚一些:

What's going on might be a little clearer if we desugar the for comprehension:

val line = parsers.foldRight(success(Nil: List[List[String]])) {
  (current, acc) => current.flatMap(x => acc.map(x :: _))
}

我们可以把flatMap写成into,把map写成^^:

We can write flatMap as into and map as ^^:

val line = parsers.foldRight(success(Nil: List[List[String]])) {
  (current, acc) => current into (x => acc ^^ (x :: _))
}

这与您的公式相去不远,只是我们使用的是正确折叠而不是反向折叠,并且不会构建和分解 ~ s.

This isn't too far from your formulation, except that we're using a right fold instead of reversing and aren't building up and breaking down the ~s.

关于效率:我们的两个实现都会导致令人不快的调用堆栈.根据我的经验,这只是 Scala 解析器组合器的真实情况.引用另一个堆栈溢出答案,对于例子:

About efficiency: Both of our implementations are going to result in unpleasant call stacks. In my experience this is just a fact of life with Scala's parser combinators. To quote another Stack Overflow answer, for example:

Scala 的解析器组合器效率不高.他们不是设计为.他们很适合做相对较小的任务小投入.

Scala's parser combinators aren't very efficient. They weren't designed to be. They're good for doing small tasks with relatively small inputs.

我的 sequence-y 方法解决了您问题中更具可读性"的部分,并且几乎可以肯定是使用 Scala 解析器组合器解决问题的最简洁方法.它比您的实现略有效率,并且对于几千个左右的组应该没问题.如果您需要处理更多,则必须查看 scala.util.parsing.combinator 之外的内容.我会推荐以下内容:

My sequence-y approach addresses the "more readable" part of your question, and is almost certainly the cleanest way to solve the problem with Scala's parser combinators. It's marginally more efficient than your implementation, and should be fine for a few thousand groups or so. If you need to handle more than that, you'll have to look outside of scala.util.parsing.combinator. I'd recommend something like the following:

def parse(counts: Seq[Int], input: String): Option[Seq[Seq[Int]]] = {
  val parsed = try {
    Some(input.split(" ").map(_.toInt))
  } catch {
    case _ : java.lang.NumberFormatException => None
  }

  parsed.flatMap { ints =>
    if (ints.length != counts.sum) None
    else Some(counts.foldLeft((Seq.empty[Seq[Int]], ints)) {
      case ((collected, remaining), count) => {
        val (m, n) = remaining.splitAt(count)
        (m.toSeq +: collected, n)
      }
    }._1.reverse)
  }
}

不能保证,但在我的系统上它不会在具有 100k 个整数组的行上溢出.

No guarantees, but on my system it doesn't overflow on a line with 100k integer groups.

这篇关于将组合解析器的列表/序列转换为一个的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆