使用语法解析可能嵌套的支撑项目 [英] Parsing a possibly nested braced item using a grammar
问题描述
我开始写BibTeX解析器.我想做的第一件事是解析一个支撑项.括号项目可以是例如作者字段或标题.字段中可能有嵌套的花括号.以下代码不不处理嵌套花括号:
I am starting to write BibTeX parser. The first thing I would like to do is to parse a braced item. A braced item could be an author field or a title for example. There might be nested braces within the field. The following code does not handle nested braces:
use v6;
my $str = q:to/END/;
author={Belayneh, M. and Geiger, S. and Matth{\"{a}}i, S.K.},
END
$str .= chomp;
grammar ExtractBraced {
rule TOP {
'author=' <braced-item> .*
}
rule braced-item { '{' <-[}]>* '}' }
}
ExtractBraced.parse( $str ).say;
输出:
「author={Belayneh, M. and Geiger, S. and Matth{\"{a}}i, S.K.},」
braced-item => 「{Belayneh, M. and Geiger, S. and Matth{\"{a}」
现在,为了使解析器接受嵌套的大括号,我想保留一个当前解析的打开大括号的计数器,当遇到一个关闭大括号时,我们将计数器递减.如果计数器达到零,则假定我们已经解析了完整的项目.
Now, in order to make the parser accept nested braces, I would like to keep a counter of the number of opening braces currently parsed and when encountering a closing brace, we decrement the counter. If the counter reaches zero, we assume that we have parsed the complete item.
为了遵循这个想法,我尝试分解braced-item
正则表达式,以对每个字符执行语法操作. (下面braced-item-char
正则表达式上的action方法应处理大括号计数器):
To follow this idea, I tried to split up the braced-item
regex, to implement an grammar action on each char. (The action method on the braced-item-char
regex below should then handle the brace-counter):
grammar ExtractBraced {
rule TOP {
'author=' <braced-item> .*
}
rule braced-item { '{' <braced-item-char>* '}' }
rule braced-item-char { <-[}]> }
}
但是,现在突然解析失败.可能是一个愚蠢的错误,但是我不知道为什么现在应该失败?
However, suddenly now the parsing fails. Probably a silly mistake, but I cannot see why it should fail now?
推荐答案
在不知道结果数据如何显示的情况下,我将其更改为如下所示:
Without knowing how you want the resultant data to look I would change it to look something like this:
my $str = 「author={Belayneh, M. and Geiger, S. and Matth{\"{a}}i, S.K.},」;
grammar ExtractBraced {
token TOP {
'author='
$<author> = <.braced-item>
.*
}
token braced-item {
'{' ~ '}'
[
|| <- [{}] >+
|| <.before '{'> <.braced-item>
]*
}
}
ExtractBraced.parse( $str ).say;
「author={Belayneh, M. and Geiger, S. and Matth{\"{a}}i, S.K.},」
author => 「{Belayneh, M. and Geiger, S. and Matth{\"{a}}i, S.K.}」
如果您想要更多的结构,它可能看起来像这样:
If you want a bit more structure It might look a bit more like this:
my $str = 「author={Belayneh, M. and Geiger, S. and Matth{\"{a}}i, S.K.},」;
grammar ExtractBraced {
token TOP {
'author='
$<author> = <.braced-item>
.*
}
token braced-part {
|| <- [{}] >+
|| <.before '{'> <braced-item>
}
token braced-item {
'{' ~ '}'
<braced-part>*
}
}
class Print {
method TOP ($/){
make $<author>.made
}
method braced-part ($/){
make $<braced-item>.?made // ~$/
}
method braced-item ($/){
make [~] @<braced-part>».made
}
}
my $r = ExtractBraced.parse( $str, :actions(Print) );
say $r;
put();
say $r.made;
「author={Belayneh, M. and Geiger, S. and Matth{\"{a}}i, S.K.},」
author => 「{Belayneh, M. and Geiger, S. and Matth{\"{a}}i, S.K.}」
braced-part => 「Belayneh, M. and Geiger, S. and Matth」
braced-part => 「{\"{a}}」
braced-item => 「{\"{a}}」
braced-part => 「\"」
braced-part => 「{a}」
braced-item => 「{a}」
braced-part => 「a」
braced-part => 「i, S.K.」
Belayneh, M. and Geiger, S. and Matth\"ai, S.K.
请注意,<-[{}]>+
上的+
和<before '{'>
是一项优化,都可以省略,并且仍然可以使用.
Note that the +
on <-[{}]>+
is an optimization, as well as <before '{'>
, both can be omitted and it will still work.
这篇关于使用语法解析可能嵌套的支撑项目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!