Boost.Spirit:在解析期间设置子语法 [英] Boost.Spirit: Setup sub-grammar during parsing

查看:153
本文介绍了Boost.Spirit:在解析期间设置子语法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

要处理大的编译时间和语法的重用我已经将我的语法组合成几个子语法,它们被顺序调用。其中一个(称为:SETUP语法)提供了解析器的一些配置(通过符号解析器),因此后来的子语法逻辑上依赖于那个(再次通过不同的符号解析器)。因此,在解析SETUP之后,需要更改以下子语法的符号解析器。



我的问题是,如何有效地处理这个问题,同时保持子语法之间的松散耦合?



只看到两种可能性:




  • SETUP语法的on_success处理程序,它可以执行工作,但是这会引入相当多的耦合。

  • 在SETUP之后,将所有内容解析为字符串,构建一个新的解析器(从改变的符号),并在第二步解析该字符串。这将留下一些开销。



我想要的是一个on_before_parse处理程序,可以通过任何语法实现在每次解析之前做一些工作。从我的角度来看,这将引入较少的耦合,并且解析器的一些设置在其他情况下也会很方便。这样的事情可能吗?



更新:



这不是我的意图。



任务是用一些关键字解析输入I,例如#task1 # task2 。但是,在某些情况下,这些关键字需要不同,例如 $$ task1 $$ task2 因此,解析的文件将以

开头

  setup {
#task1 = $$ task1
#task2 = $$ task2
}

realwork {
...
}

一些代码草图:Given是一个主解析器,由多个(至少两个)解析器组成。

 模板< typename迭代器> 
struct MainParser:qi :: grammar< Iterator,Skipper< Iterator>> {

MainParser():MainParser :: base_type(start){
start = setup>>现实
}

Setup< Iterator>建立;
RealWork< Iterator>现实

qi :: rule< Iterator,Skipper< Iterator> >开始;
}

code> RealWork 自己是解析器(我的从上面的子解析器)。在设置部分期间,语法的一些关键字可以被改变,因此设置部分具有 qi :: symbols 规则。在开始时,这些符号将包含#task1 #task2 。解析文件的第一部分后,它们包含 $$ task1 $$ task2



由于关键字已更改,并且由于 RealWork 需要解析I,所以需要了解新关键字。因此,在文件配对期间,我必须将符号从设置传输到 RealWork



我看到的两种方法是:




  • 使 c $ c>知道 RealWork 并将符号从设置转移到 RealWork 设置 qi :: on_success 处理程序中的c $ c> (bad,coupling)

  • 切换到两个解析步骤。 c> c>

    $

      start = setup>> unparsed_rest 

    并且会有第二个解析器afer MainParser 。示意图:

      SymbolTable表; 
    string Unparsed_Rest;
    MainParser.parse(Input,(Unparsed_Rest,Table));

    RealWordParser.setupFromAlteredSymbolTable(Table);
    RealWorkParser.parse(Unparsed_Rest);

    多个解析步骤的开销。




因此,到目前为止,属性不起作用。只需在解析时改变解析器来处理几种输入文件。



我的希望是一个处理程序 qi :: on_before_parse like qi :: on_success 。从这个想法,每当解析器开始解析输入时,将触发此处理程序。理论上只是在解析开始时截取,就像我们截取 on_success on_error

解决方案

很遗憾,你没有显示任何代码,你的描述有点...粗略。所以这里是一个相当通用的答案,解决了我能够从你的问题中提出的一些问题:



分离关注点



这听起来非常像你需要分离AST构建从转换/处理步骤。



解析器组合



当然可以撰写语法。简单地组合语法,你将以任何传统的方式( pImpl idiom,const静态内部规则,无论符合条件)来规则和隐藏这些语法的实现。但是,组合通常不需要事件驱动元素:如果您觉得需要在两个阶段解析,它听起来像我只是努力保持概述,但递归下降或PEG语法自然是非常适合描述语法像一个 swoop (或一个通行证,如果你愿意)。



但是,如果您发现



(a)您的语法会变得复杂


您可以考虑


  1. Nabialek的伎俩(我已经在我的
  2. 您可以动态构建规则(这不是很容易推荐的原因,因为你将运行在致命的陷阱与复制Proto表达式树,这导致悬挂引用)。我也在场合显示了一些答案:





    REPEAT:除非你知道如何检测UB,


希望这些功能可以帮助您。如果没有,我建议您回来一个具体问题。我更喜欢在家里使用代码,而不是想法,因为往往意味着对你而言比我更重要。


To handle large compile times and reuse of grammars I've composed my grammar into several sub-grammars which are called in sequence. One of them (call it: SETUP grammar) offers some configuration of the parser (via symbols parser), so later sub grammars logically depend on that one (again via different symbols parsers). So, after SETUP is parsed, the symbols parsers of the following sub grammars need to be altered.

My question is, how to approach this efficiently while preserving loose coupling between the sub grammars?

Currently I see only two possibilities:

  • The on_success handler of the SETUP grammar, which could do the work, but this would introduce quite some coupling.
  • After the SETUP, parse everything into a string, build up a new parser (from the altered symbols) and parse that string in a second step. This would leave quite some overhead.

What I would like to have is a on_before_parse handler, which could be implemented by any grammar which needs to do some work before each parsing. From my point of view, this would introduce less coupling and some setup of the parser could come handy in other situations, too. Is something like this possible?

Update:

Sorry for being sketchy, that wasn't my intention.

The task is to parse an input I with some keywords like #task1 and #task2. But there will be cases where these keywords need to be different, say $$task1 and $$task2.

So the parsed file will start with

setup {
  #task1=$$task1
  #task2=$$task2
}

realwork {
  ...
}

Some code sketches: Given is a main parser, consisting of several (at least two) parsers.

template<typename Iterator>
struct MainParser: qi::grammar<Iterator, Skipper<Iterator>> {

  MainParser() : MainParser::base_type(start) {
    start = setup >> realwork;
  }

  Setup<Iterator>    setup;
  RealWork<Iterator> realwork;

  qi::rule<Iterator, Skipper<Iterator> > start;
}

Setup and RealWork are themselves parsers (my sub parsers from above). During the setup part, some keywords of the grammar may be altered, so the setup part has a qi::symbols<char, keywords> rule. In the beginning these symbols will contain #task1 and #task2. After parsing the first part of the file, they contain $$task1 and $$task2.

Since the keywords have changed and since RealWork needs to parse I, it needs to know about the new keywords. So I have to transfer the symbols from Setup to RealWork during the paring of the file.

The two approaches I see are:

  • Make the Setup aware of RealWork and transfer the symbols from Setup to RealWork in the qi::on_success handler of Setup. (bad, coupling)
  • Switch to two parsing steps. start of MainParser will look like

    start = setup >> unparsed_rest
    

    and there will be a second parser afer MainParser. Schematically:

    SymbolTable Table;
    string Unparsed_Rest;
    MainParser.parse(Input, (Unparsed_Rest, Table));
    
    RealWordParser.setupFromAlteredSymbolTable(Table);
    RealWorkParser.parse(Unparsed_Rest);
    

    Overhead of several parsing steps.

So, up to now, attributes are not into play. Just changing the parser at parse time to handle several kinds of input files.

My hope is a handler qi::on_before_parse like qi::on_success. From the idea this handler would be triggered each time the parser starts parsing an input. Theoretically just an interception at the beginning of parsing, like we have the interceptions on_success and on_error.

解决方案

Sadly, you showed no code, and your description is a bit... sketchy. So here's a fairly generic answer that addresses some of the points I was able to distill from your question:

Separation of concerns

It sounds very much like you need to separate AST building from transformation/processing steps.

Parser composition

Of course you can compose grammars. Simply compose grammars as you would rules and hide the implementation of these grammars in any traditional way you would (pImpl idiom, const static internal rules, whatever fits the bill).

However, the composition usually doesn't require an 'event' driven element: if you feel the need to parse in two phases, it sounds to me you're just struggling to keep the overview, but recursive descent or PEG grammars are naturally well-suited to describe grammars like that in one swoop (or one pass, if you will).

However, if you find that

(a) your grammar gets complicated
(b) or you want to be able to selectively plugin subgrammars depending on runtime features

You could consider

  1. The Nabialek trick (I've shown/mentioned this on several occasions in my [tag:boost-spirit] answers on this site
  2. You could build rules dynamically (this is not readily recommended because you'll run in deadly traps having to do with copying Proto expression trees which leads to dangling references). I have also shown some answers doing this on occasion:

    REPEAT: don't try this unless you know how to detect UB and fix things with Proto

Hope these things help you on track. If not, I suggest you come back with a concrete question. I'm much more at home with code than 'ideas' because ideas often mean something else to you than to me.

这篇关于Boost.Spirit:在解析期间设置子语法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆