使用 Scala 的解析器组合器时,如何忽略不匹配的前面的文本? [英] How can I ignore non-matching preceding text when using Scala's parser combinators?

查看:32
本文介绍了使用 Scala 的解析器组合器时,如何忽略不匹配的前面的文本?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我真的很喜欢解析器组合器,但是当我不关心相关文本之前的文本时,我对我提出的提取数据的解决方案不满意.

I really like parser combinators but I'm not happy with the solution I've come up with to extract data when I don't care about the text before the relevant text.

考虑使用这个小型解析器来获取货币金额:

Consider this small parser to get monetary amounts:

import scala.util.parsing.combinator._

case class Amount(number: Double, currency: String)

object MyParser extends JavaTokenParsers {
  def number = floatingPointNumber ^^ (_.toDouble)
  def currency = """\w+""".r ^? ({
    case "USD" => "USD"
    case "EUR" => "EUR"
  }, "Unknown currency code: " + _)

  def amount = (number ~ currency) ^^ {
    case num ~ curr => Amount(num, curr)
  } | currency ~ number ^^ {
    case curr ~ num => Amount(num, curr)
  }

  def junk = """\S+""".r
  def amountNested: Parser[Any] = amount | junk ~> amountNested
}

如你所见,如果我给解析器一个以有效数据开头的字符串,我可以很容易地得到Amount:

As you can see, I can get Amounts back easily if I give the parser a string that begins with valid data:

scala> MyParser.parse(MyParser.amount, "101.41 EUR")
res7: MyParser.ParseResult[Amount] = [1.11] parsed: Amount(101.41,EUR)

scala> MyParser.parse(MyParser.amount, "EUR 102.13")
res8: MyParser.ParseResult[Amount] = [1.11] parsed: Amount(102.13,EUR)

但是,当它之前有不匹配的文本时它会失败:

However, it fails when there is non-matching text before it:

scala> MyParser.parse(MyParser.amount, "I have 101.41 EUR")
res9: MyParser.ParseResult[Amount] = 
[1.2] failure: Unknown currency code: I

I have 101.41 EUR
 ^

我的解决方案是 amountNested 解析器,它递归地尝试查找 Amount.这有效,但它给出了 ParseResult[Any]:

My solution is the amountNested parser, in which it recursively tries to find an Amount. This works but it gives a ParseResult[Any]:

scala> MyParser.parse(MyParser.amountNested, "I have 101.41 EUR")
res10: MyParser.ParseResult[Any] = [1.18] parsed: Amount(101.41,EUR)

这种类型信息的丢失(当然可以使用模式匹配检索")似乎很不幸,因为任何成功都将包含一个Amount.

This loss of type information (which can be 'retrieved' using pattern matching, of course) seems unfortunately because any success will contain an Amount.

有没有办法继续搜索我的输入(I have 101.41 EUR"),直到我有匹配或没有Parser[Any]?

Is there a way to keep searching my input ("I have 101.41 EUR") until I have a match or not but without having a Parser[Any]?

查看 ScalaDocs 似乎 Parser 上的 * 方法可能会有所帮助,但是当我尝试以下操作时,我得到的只是失败或无限循环:

Looking at the ScalaDocs it seems like the * method on Parser might help but all I get are failures or infinite loops when I try things like:

def amount2 = ("""\S+""".r *) ~> amount

推荐答案

如果你把你的 amountNested 声明为 Parser[Amount],它的类型检查就没问题:

It typechecks all right if you declare your amountNested to be Parser[Amount] :

def amountNested: Parser[Amount] = amount | junk ~> amountNested

这篇关于使用 Scala 的解析器组合器时,如何忽略不匹配的前面的文本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆