Parslet词,直到出现分界符 [英] Parslet word until delimeter present

查看:103
本文介绍了Parslet词,直到出现分界符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我只是从红宝石和荷兰芹开始,所以这对其他人(希望如此)可能是显而易见的.

I'm just starting with ruby and parslet, so this might be obvious to others (hopefully).

我想让所有单词一直到分隔符(^)为止而不会使用

I'm wanting to get all the words up until a delimiter (^) without consuming it

以下规则有效(但消耗了分度),结果为{:wrd=>"otherthings"@0, :delim=>"^"@11}

The following rule works (but consumes the delimeter) with a result of {:wrd=>"otherthings"@0, :delim=>"^"@11}

require 'parslet'    
class Mini < Parslet::Parser
      rule(:word) { match('[a-zA-Z]').repeat}
      rule(:delimeter) { str('^') }
      rule(:othercontent) { word.as(:wrd) >> delimeter.as(:delim) }
      root(:othercontent)
end
puts Mini.new.parse("otherthings^")

我正在尝试使用礼物?",

I was trying to use the 'present?',

require 'parslet' 
class Mini < Parslet::Parser
  rule(:word) { match('[a-zA-Z]').repeat}
  rule(:delimeter) { str('^') }
  rule(:othercontent) { word.as(:wrd) >> delimeter.present? }
  root(:othercontent)
end
puts Mini.new.parse("otherthings^")

但这会引发异常:

Failed to match sequence (wrd:WORD &DELIMETER) at line 1 char 12. (Parslet::ParseFailed)

稍后,我将要检查定界符右边的单词以建立更复杂的语法,这就是为什么我不想使用定界符.

At a later stage I'll want to inspect the word to the right of the delimeter to build up a more complex grammar which is why I don't want to consume the delimeter.

我正在使用Parslet 1.5.0.

I'm using parslet 1.5.0.

感谢您的帮助!

推荐答案

TL; DR; 如果您关心"^"之前的内容,则应首先对其进行解析.

TL;DR; If you care what is before the "^" you should parse that first.

-更长的答案---

解析器将始终使用所有文本.如果不能使用所有内容,则该语法将无法完全描述该文档.与其将其视为对文本执行拆分"的东西,不如将其视为消耗文本流的聪明状态机.

A parser will always consume all the text. If it can't consume everything, then the document is not fully described by the grammar. Rather than thinking of it as something performing "splits" on your text... instead think of it as a clever state machine consuming a stream of text.

所以...由于您的完整语法需要使用所有文档...在开发解析器时,您无法使其解析一部分而剩下的部分.您希望它可以将文档转换为树,以便可以对其进行最终处理.

So... as your full grammar needs to consume all the document... when developing your parser, you can't make it to parse some part and leave the rest. You want it to transform your document into a tree so you can manipulate it into it's final from.

如果您真的只想在定界符之前使用所有文本,则可以执行以下操作...

If you really wanted to just consume all text before a delimiter, then you could do something like this...

说我要解析"^"分隔的事物列表.

Say I was going to parse a '^' separated list of things.

我可能有以下规则

rule(:thing) { (str("^").absent? >> any).repeat(1) }  # anything that's not a ^
rule(:list)  { thing >> ( str("^") >> thing).repeat(0) } #^ separated list of things

这将按以下方式工作

parse("thing1^thing2") #=> "thing1^thing2"
parse("thing1") #=> "thing1"
parse("thing1^") #=> ERROR ... nothing after the ^ there should be a 'thing'

这意味着list将匹配不以'^'结尾或不结尾的字符串.但是,为了有用,我需要使用"as"关键字拉出作为值的位

This would mean list would match a string that doesn't end or start with an '^'. To be useful however I need to pull out the bits that are the values with the "as" keyword

rule(:thing) { (str("^").absent? >> any).repeat(1).as(:thing) }
rule(:list)  { thing >> ( str("^") >> thing).repeat(0) }

现在,当list与字符串匹配时,我得到了一个由事物"组成的哈希数组.

Now when list matches a string I get an array of hashes of "things".

parse("thing1^thing2") #=> [ {:thing=>"thing1"@0} , {:thing=>"thing2"@7} ] 

但是实际上,您可能在乎什么是东西" ...不仅会有任何东西会去那里.

In reality however you probably care what a 'thing' is... not just anything will go there.

在那种情况下..您应该从定义这些规则开始...因为您不想使用解析器按"^"进行拆分,然后重新解析字符串以弄清它们的组成.

In that case.. you should start by defining those rules... because you don't want to use the parser to split by "^" then re-parse the strings to work out what they are made of.

例如:

parse("6 + 4 ^ 2") 
 # => [ {:thing=>"6 + 4 "@0}, {:thing=>" 2"@7} ]

我可能想忽略事物"周围的white_space,我可能想分别处理6个+和4个.当我这样做时,我将不得不放弃我的不是'^'的所有东西"规则.

And I probably want to ignore the white_space around the "thing"s and I probably want to deal with the 6 the + and the 4 all separately. When I do that I am going to have to throw away my "all things that aren't '^'" rule.

这篇关于Parslet词,直到出现分界符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆