忽略解析器组合器中的任意前缀 [英] Ignoring an arbitrary prefix in a parser combinator

查看:96
本文介绍了忽略解析器组合器中的任意前缀的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在厌倦了正则表达式后,我一直在尝试使用scala的解析器组合器库作为对正则表达式的更直观的替代.但是,当我想在字符串中搜索某个模式并忽略之前的内容时遇到了一个问题,例如,如果我想检查字符串中是否包含章鱼"一词,我可以执行类似

After getting fed up with regexes I have been trying to use scala's parser combinator libraries as a more intuitive replacement for regexes. However, I've run into a problem when I want to search a string for a pattern and ignore things that come before it, for example if I want to check if a string contains the word "octopus" I can do something like

val r = "octopus".r
r.findFirstIn("www.octopus.com")

正确给出Some(octopus).

但是,使用解析器组合器

However, using parser combinators

import scala.util.parsing.combinator._
object OctopusParser extends RegexParsers {

  def any = regex(".".r)*
  def str = any ~> "octopus" <~ any

  def parse(s: String) = parseAll(str, s) 
}

OctopusParser.parse("www.octopus.com")

但是我得到一个错误

scala> OctopusParser.parse("www.octopus.com")
res0: OctopusParser.ParseResult[String] = 
[1.16] failure: `octopus' expected but end of source found

www.octopus.com

有没有很好的方法来实现这一目标?从玩法来看,似乎any吞噬了太多的输入.

Is there a good way to accomplish this? From playing around, it seems that any is swallowing too much of the input.

推荐答案

问题是您的任何"解析器都是贪婪的,因此它与整行匹配,没有任何内容可供"str"解析.

The problem is that your 'any' parser is greedy, so it is matching the whole line, leaving nothing for 'str' to parse.

您可能想尝试类似的东西:

You might want to try something like:

object OctopusParser extends RegexParsers {

  def prefix = regex("""[^\.]*\.""".r) // Match on anything other than a dot and then a dot - but only the once
  def postfix = regex("""\..*""".r)* // Grab any number of remaining ".xxx" blocks
  def str = prefix ~> "octopus" <~ postfix

  def parse(s: String) = parseAll(str, s)
}

然后它给了我

scala> OctopusParser.parse("www.octopus.com")
res0: OctopusParser.ParseResult[String] = [1.13] parsed: octopus

您可能需要使用前缀"来匹配您期望的输入范围,并且可能要使用?"懒惰的人,如果太贪婪.

You may need to play around with 'prefix' to match the range of input you are expecting, and might want to use the '?' lazy marker if it is being too greedy.

这篇关于忽略解析器组合器中的任意前缀的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆