使用Marpa:r2 perl解析单引号字符串 [英] Parse single quoted string using Marpa:r2 perl
问题描述
如何使用Marpa:r2解析单引号字符串? 在我下面的代码中,单引号的字符串在解析时会附加"\".
How to parse single quoted string using Marpa:r2? In my below code, the single quoted strings appends '\' on parsing.
代码:
use strict;
use Marpa::R2;
use Data::Dumper;
my $grammar = Marpa::R2::Scanless::G->new(
{ default_action => '[values]',
source => \(<<'END_OF_SOURCE'),
lexeme default = latm => 1
:start ::= Expression
# include begin
Expression ::= Param
Param ::= Unquoted
| ('"') Quoted ('"')
| (') Quoted (')
:discard ~ whitespace
whitespace ~ [\s]+
Unquoted ~ [^\s\/\(\),&:\"~]+
Quoted ~ [^\s&:\"~]+
END_OF_SOURCE
});
my $input1 = 'foo';
#my $input2 = '"foo"';
#my $input3 = '\'foo\'';
my $recce = Marpa::R2::Scanless::R->new({ grammar => $grammar });
print "Trying to parse:\n$input1\n\n";
$recce->read(\$input1);
my $value_ref = ${$recce->value};
print "Output:\n".Dumper($value_ref);
输出:
Trying to parse:
foo
Output:
$VAR1 = [
[
'foo'
]
];
Trying to parse:
"foo"
Output:
$VAR1 = [
[
'foo'
]
];
Trying to parse:
'foo'
Output:
$VAR1 = [
[
'\'foo\''
]
]; (don't want it to be parsed like this)
以上是所有输入的输出,我不希望第3个附加"\"和单引号..我希望将其像OUTPUT2一样进行解析.请告知.
Above are the outputs of all the inputs, i don't want 3rd one to get appended with the '\' and single quotes.. I want it to be parsed like OUTPUT2. Please advise.
理想情况下,它应该根据Param :: =(')引号(')
Ideally, it should just pick the content between single quotes according to Param ::= (') Quoted (')
推荐答案
关于Data :: Dumper输出的其他答案是正确的.但是,您的语法无法达到您期望的效果.
The other answer regarding Data::Dumper output is correct. However, your grammar does not work the way you expect it to.
当解析输入'foo'
时,Marpa将考虑三个Param
替代方案.该位置的预测词素为:
When you parse the input 'foo'
, Marpa will consider the three Param
alternatives. The predicted lexemes at that position are:
-
Unquoted ~ [^\s\/\(\),&:\"~]+
-
'"'
-
') Quoted ('
Unquoted ~ [^\s\/\(\),&:\"~]+
'"'
') Quoted ('
是的,最后一个字面上是) Quoted (
,不是任何包含单引号的内容.
Yes, the last is literally ) Quoted (
, not anything containing a single quote.
即使是([']) Quoted (['])
:由于最长的令牌匹配,未加引号的词首将匹配包括单引号在内的整个输入.
Even if it were ([']) Quoted (['])
: Due to longest token matching, the Unquoted lexeme will match the entire input, including the single quote.
对于像" foo "
这样的输入(带双引号)会发生什么?现在,只有'"'
lexeme会匹配,然后将舍弃任何空格,然后将带引号的lexeme匹配,然后舍弃任何空格,然后关闭"
.
What would happen for an input like " foo "
(with double quotes)? Now, only the '"'
lexeme would match, then any whitespace would be discarded, then the Quoted lexeme matches, then any whitespace is discarded, then closing "
is matched.
为避免这种跳空白的行为并防止由于LATM而首选不带引号"的规则,将带引号的字符串描述为词素是有意义的.例如:
To prevent this whitespace-skipping behaviour and to prevent the Unquoted rule from being preferred due to LATM, it makes sense to describe quoted strings as lexemes. For example:
Param ::= Unquoted | Quoted
Unquoted ~ [^'"]+
Quoted ~ DQ | SQ
DQ ~ '"' DQ_Body '"' DQ_Body ~ [^"]*
SQ ~ ['] SQ_Body ['] SQ_Body ~ [^']*
这些词素将包括引号和转义符,因此您需要对词素内容进行后处理.您可以使用事件系统(从概念上讲是干净的,但是实现起来有点麻烦)来执行此操作,也可以添加一个在解析评估期间执行此处理的操作.
These lexemes will then include any quotes and escapes, so you need to post-process the lexeme contents. You can either do this using the event system (which is conceptually clean, but a bit cumbersome to implement), or adding an action that performs this processing during parse evaluation.
由于词素不能执行操作,因此通常最好添加代理产品:
Since lexemes cannot have actions, it is usually best to add a proxy production:
Param ::= Unquoted | Quoted
Unquoted ~ [^'"]+
Quoted ::= Quoted_Lexeme action => process_quoted
Quoted_Lexeme ~ DQ | SQ
DQ ~ '"' DQ_Body '"' DQ_Body ~ [^"]*
SQ ~ ['] SQ_Body ['] SQ_Body ~ [^']*
然后该动作可以执行以下操作:
The action could then do something like:
sub process_quoted {
my (undef, $s) = @_;
# remove delimiters from double-quoted string
return $1 if $s =~ /^"(.*)"$/s;
# remove delimiters from single-quoted string
return $1 if $s =~ /^'(.*)'$/s;
die "String was not delimited with single or double quotes";
}
这篇关于使用Marpa:r2 perl解析单引号字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!