在Ruby中使用Parslet进行缩进敏感的解析器? [英] Indentation sensitive parser using Parslet in Ruby?
问题描述
我正在尝试使用Ruby中的 Parslet 库解析一种简单的对缩进敏感的语法.
I am attempting to parse a simple indentation sensitive syntax using the Parslet library within Ruby.
以下是我尝试解析的语法示例:
The following is an example of the syntax I am attempting to parse:
level0child0
level0child1
level1child0
level1child1
level2child0
level1child2
生成的树看起来像这样:
The resulting tree would look like so:
[
{
:identifier => "level0child0",
:children => []
},
{
:identifier => "level0child1",
:children => [
{
:identifier => "level1child0",
:children => []
},
{
:identifier => "level1child1",
:children => [
{
:identifier => "level2child0",
:children => []
}
]
},
{
:identifier => "level1child2",
:children => []
},
]
}
]
我现在拥有的解析器可以解析嵌套级别0和1的节点,但是不能解析过去的嵌套:
The parser that I have now can parse nesting level 0 and 1 nodes, but cannot parse past that:
require 'parslet'
class IndentationSensitiveParser < Parslet::Parser
rule(:indent) { str(' ') }
rule(:newline) { str("\n") }
rule(:identifier) { match['A-Za-z0-9'].repeat.as(:identifier) }
rule(:node) { identifier >> newline >> (indent >> identifier >> newline.maybe).repeat.as(:children) }
rule(:document) { node.repeat }
root :document
end
require 'ap'
require 'pp'
begin
input = DATA.read
puts '', '----- input ----------------------------------------------------------------------', ''
ap input
tree = IndentationSensitiveParser.new.parse(input)
puts '', '----- tree -----------------------------------------------------------------------', ''
ap tree
rescue IndentationSensitiveParser::ParseFailed => failure
puts '', '----- error ----------------------------------------------------------------------', ''
puts failure.cause.ascii_tree
end
__END__
user
name
age
recipe
name
foo
bar
很明显,我需要一个动态计数器,该计数器希望3个缩进节点与嵌套级别3上的标识符匹配.
It's clear that I need a dynamic counter that expects 3 indentation nodes to match a identifier on the nesting level 3.
如何以这种方式使用Parslet实现缩进敏感的语法解析器?有可能吗?
How can I implement an indentation sensitive syntax parser using Parslet in this way? Is it possible?
推荐答案
有几种方法.
-
通过将每一行识别为缩进和标识符的集合来解析文档,然后再进行转换以根据缩进的数量重建层次结构.
Parse the document by recognising each line as a collection of indents and an identifier, then apply a transformation afterwards to reconstruct the hierarchy based on the number of indents.
使用捕获来存储当前的缩进,并期望下一个节点包含该缩进以及其他要与之匹配的缩进(我不太喜欢这种方法,因为下一个发生在我身上)
Use captures to store the current indent and expect the next node to include that indent plus more to match as a child (I didn't dig into this approach much as the next one occurred to me)
规则只是方法.因此,您可以将"node"定义为方法,这意味着您可以传递参数! (如下)
Rules are just methods. So you can define 'node' as a method, which means you can pass parameters! (as follows)
这使您可以根据node(depth+1)
定义node(depth)
.但是,这种方法的问题在于node
方法与字符串不匹配,它生成了一个解析器.因此,递归调用将永远不会结束.
This lets you define node(depth)
in terms of node(depth+1)
. The problem with this approach, however, is that the node
method doesn't match a string, it generates a parser. So a recursive call will never finish.
这就是为什么dynamic
存在的原因.它会返回一个解析器,直到解析器尝试匹配它为止,该解析器无法解析,从而使您现在可以毫无问题地进行递归操作.
This is why dynamic
exists. It returns a parser that isn't resolved until the point it tries to match it, allowing you to now recurse without problems.
请参见以下代码:
require 'parslet'
class IndentationSensitiveParser < Parslet::Parser
def indent(depth)
str(' '*depth)
end
rule(:newline) { str("\n") }
rule(:identifier) { match['A-Za-z0-9'].repeat(1).as(:identifier) }
def node(depth)
indent(depth) >>
identifier >>
newline.maybe >>
(dynamic{|s,c| node(depth+1).repeat(0)}).as(:children)
end
rule(:document) { node(0).repeat }
root :document
end
这是我最喜欢的解决方案.
This is my favoured solution.
这篇关于在Ruby中使用Parslet进行缩进敏感的解析器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!