使用Marpa:r2 perl解析单引号字符串 [英] Parse single quoted string using Marpa:r2 perl

查看:139
本文介绍了使用Marpa:r2 perl解析单引号字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何使用Marpa:r2解析单引号字符串? 在我下面的代码中,单引号的字符串在解析时会附加"\".

How to parse single quoted string using Marpa:r2? In my below code, the single quoted strings appends '\' on parsing.

代码:

use strict;
use Marpa::R2;
use Data::Dumper;


my $grammar = Marpa::R2::Scanless::G->new(
   {  default_action => '[values]',
      source         => \(<<'END_OF_SOURCE'),
  lexeme default = latm => 1

:start ::= Expression

# include begin

Expression ::= Param
Param ::= Unquoted                                         
        | ('"') Quoted ('"') 
        | (') Quoted (')

:discard      ~ whitespace 
whitespace    ~ [\s]+

Unquoted      ~ [^\s\/\(\),&:\"~]+
Quoted        ~ [^\s&:\"~]+

END_OF_SOURCE
   });

my $input1 = 'foo';
#my $input2 = '"foo"';
#my $input3 = '\'foo\'';

my $recce = Marpa::R2::Scanless::R->new({ grammar => $grammar });

print "Trying to parse:\n$input1\n\n";
$recce->read(\$input1);
my $value_ref = ${$recce->value};
print "Output:\n".Dumper($value_ref);

输出:

Trying to parse:
foo

Output:
$VAR1 = [
          [
            'foo'
          ]
        ];

Trying to parse:
"foo"

Output:
$VAR1 = [
          [
            'foo'
          ]
        ];

Trying to parse:
'foo'

Output:
$VAR1 = [
          [
            '\'foo\''
          ]
        ]; (don't want it to be parsed like this)

以上是所有输入的输出,我不希望第3个附加"\"和单引号..我希望将其像OUTPUT2一样进行解析.请告知.

Above are the outputs of all the inputs, i don't want 3rd one to get appended with the '\' and single quotes.. I want it to be parsed like OUTPUT2. Please advise.

理想情况下,它应该根据Param :: =(')引号(')

Ideally, it should just pick the content between single quotes according to Param ::= (') Quoted (')

推荐答案

关于Data :: Dumper输出的其他答案是正确的.但是,您的语法无法达到您期望的效果.

The other answer regarding Data::Dumper output is correct. However, your grammar does not work the way you expect it to.

当解析输入'foo'时,Marpa将考虑三个Param替代方案.该位置的预测词素为:

When you parse the input 'foo', Marpa will consider the three Param alternatives. The predicted lexemes at that position are:

  • Unquoted ~ [^\s\/\(\),&:\"~]+
  • '"'
  • ') Quoted ('
  • Unquoted ~ [^\s\/\(\),&:\"~]+
  • '"'
  • ') Quoted ('

是的,最后一个字面上是) Quoted (,不是任何包含单引号的内容.

Yes, the last is literally ) Quoted (, not anything containing a single quote.

即使是([']) Quoted ([']):由于最长的令牌匹配,未加引号的词首将匹配包括单引号在内的整个输入.

Even if it were ([']) Quoted ([']): Due to longest token matching, the Unquoted lexeme will match the entire input, including the single quote.

对于像" foo "这样的输入(带双引号)会发生什么?现在,只有'"' lexeme会匹配,然后将舍弃任何空格,然后将带引号的lexeme匹配,然后舍弃任何空格,然后关闭".

What would happen for an input like " foo " (with double quotes)? Now, only the '"' lexeme would match, then any whitespace would be discarded, then the Quoted lexeme matches, then any whitespace is discarded, then closing " is matched.

为避免这种跳空白的行为并防止由于LATM而首选不带引号"的规则,将带引号的字符串描述为词素是有意义的.例如:

To prevent this whitespace-skipping behaviour and to prevent the Unquoted rule from being preferred due to LATM, it makes sense to describe quoted strings as lexemes. For example:

Param ::= Unquoted | Quoted
Unquoted ~ [^'"]+
Quoted ~ DQ | SQ
DQ ~ '"' DQ_Body '"'  DQ_Body ~ [^"]*
SQ ~ ['] SQ_Body [']  SQ_Body ~ [^']*

这些词素将包括引号和转义符,因此您需要对词素内容进行后处理.您可以使用事件系统(从概念上讲是干净的,但是实现起来有点麻烦)来执行此操作,也可以添加一个在解析评估期间执行此处理的操作.

These lexemes will then include any quotes and escapes, so you need to post-process the lexeme contents. You can either do this using the event system (which is conceptually clean, but a bit cumbersome to implement), or adding an action that performs this processing during parse evaluation.

由于词素不能执行操作,因此通常最好添加代理产品:

Since lexemes cannot have actions, it is usually best to add a proxy production:

Param ::= Unquoted | Quoted
Unquoted ~ [^'"]+
Quoted ::= Quoted_Lexeme action => process_quoted
Quoted_Lexeme ~ DQ | SQ
DQ ~ '"' DQ_Body '"'  DQ_Body ~ [^"]*
SQ ~ ['] SQ_Body [']  SQ_Body ~ [^']*

然后该动作可以执行以下操作:

The action could then do something like:

sub process_quoted {
  my (undef, $s) = @_;
  # remove delimiters from double-quoted string
  return $1 if $s =~ /^"(.*)"$/s;
  # remove delimiters from single-quoted string
  return $1 if $s =~ /^'(.*)'$/s;
  die "String was not delimited with single or double quotes";
}

这篇关于使用Marpa:r2 perl解析单引号字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆