源代码解析和类似结构的类宏处理 [英] Source code parsing and macro-like handling of similar constructions

查看:84
本文介绍了源代码解析和类似结构的类宏处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

TL; DR版本:是否有支持以下内容的解析器生成器:当某些规则被简化时(我假设LALR(1)解析器),则不执行归约操作,但解析器将退出并用其他代码替换输入使用此规则中的值并解析该代码.如果需要,请重复.因此,如果代码是"i ++"并且规则是"expr POST_INCR",则我可以做更多或更少的事情:

TL;DR VERSION: Is there a parser generator that supports the following: when some rule is reduced (I assume LALR(1) parser), then reduction isn't performed, but parser backs off and replaces input with different code using values from this rule and parses that code. Repeat if needed. So if code is "i++" and rule is "expr POST_INCR", I can do more or less:

expr POST_INCR -> "(tmp = expr; expr = expr + 1; tmp)"

那么基本上是使用宏重写代码吗?

So basicaly code rewriting using macros?

长版:

我写了另一种简单的解释语言(为简单起见,用Java写).可以,但是提出了一些问题.简介很长,但是很简单,可以帮助我清楚地显示出我的问题.

I wrote yet another simple interpreted language (in Java for simplicity). It works ok, but it raised some question. Introduction is pretty long, but simple and helps to shows my problem clearly (I think).

我有"while"循环.鉴于:

I have "while" loop. It is pretty simple, given:

WHILE LPARE boolean_expression RPAREN statement

我或多或少地产生以下内容:

I generate more or less the following:

new WhileNode(boolean_expression, statement); 

这将创建一个新节点,稍后再访问该节点时,将为我的虚拟机生成代码.但我也有以下内容:

This creates new node that, when visited later, generates code for my virtual machine. But I also have the following:

FOR LPAREN for_init_expr SEMICOLON boolean_expression SEMICOLON for_post_expr RPAREN statement

这是Java或C已知的"for循环".根据上述规则,我或多或少地创建了以下内容:

This is "for loop" known from Java or C. From aforementioned rule I create more or less the following:

new ListNode(
    for_init_expr,
    new WhileNode(
        boolean_expression,
        new ListNode(
            statement,
            new ListNode(for_post_expr, null))))

这当然是简单的转换,来自:

This is of course simple transformation, from:

for (for_init ; boolean_expression ; for_post_expr)
    statement

收件人:

for_init
while (boolean_expression) {
    statement
    for_post_expr;
}

一切都很好,但由于以下原因,事情变得繁琐:

All is fine and dandy, but things get hairy for the following:

FOR LPAREN var_decl COLON expression RPAREN statement

这是众所周知和喜欢的:

This if well known and liked:

for (int x : new int[] { 1, 2 })
    print(x);

我避免发布会生成AST的代码,因为基本的for循环已经有点长了,而我们得到的结果则更加糟糕.这种构造等于:

I refrain from posting code that generates AST, since basic for loop was already a little bit long, and what we get here is ever worse. This construction is equal to:

int[] tmp = new int[] { 1, 2 };
for (int it = 0 ; it < tmp.length; it = it + 1) {
    int x = tmp[it];
    print(x);
}

由于我没有使用类型,所以我只是假设表达式"(在右侧,在COLON之后)是我可以迭代的内容(数组是不可迭代的),因此我会在的结果上调用一个函数此表达式"返回Iterable的实例.因此,事实上,我重写的代码并不像上面的代码那么简单,或多或少是这样的:

And since I'm not using types, I simply assume that "expression" (so right side, after COLON) is something that I can iterate over (and arrays are not iterable), I call a function on a result of this "expression" which returns instance of Iterable. So, in fact, my rewritten code isn't as simple as one above, is more or less this:

Iterator it = makeIterable(new int[] { 1, 2 });
while (it.hasNext()) {
    int x = it.next();
    print(x);
}

看起来还不错,但是请注意,AST会生成三个函数调用和while循环.为了告诉您这是什么混乱,我发布了我现在拥有的东西:

It doesn't look THAT bad, but note that AST for this generates three function calls and while loop. To show you what mess it is, I post what I have now:

113     | FOR LPAREN var_decl_name.v PIPE simple_value_field_or_call.o RPAREN statement.s
114                                         {: Symbol sv = ext(_symbol_v, _symbol_o);
115                                            String autoVarName = generateAutoVariableName();
116                                            Node iter = new StatementEndNode(sv, "",
117                                                new BinNode(sv, CMD.SET, "=",
118                                                    new VarDeclNode(sv, autoVarName),
119                                                    new CallNode(sv, "()",
120                                                        new BinNode(sv, CMD.DOT, ".",
121                                                            new MkIterNode(sv, o),
122                                                            new PushConstNode(sv, "iterator")))));
123                                            Node varinit = new StatementEndNode(sv, "",
124                                                new BinNode(sv, CMD.SET, "=",
125                                                    v,
126                                                    new PushConstNode(sv, "null")));
127                                            Node hasnext = new CallNode(sv, "()",
128                                                new BinNode(sv, CMD.DOT, ".",
129                                                    new VarNode(sv, autoVarName),
130                                                    new PushConstNode(sv, "hasNext")));
131                                            Node vargennext = new StatementEndNode(sv, "",
132                                                new BinNode(sv, CMD.SET, "=",
133                                                    new VarNode(sv, v.name),
134                                                    new CallNode(sv, "()",
135                                                        new BinNode(sv, CMD.DOT, ".",
136                                                            new VarNode(sv, autoVarName),
137                                                            new PushConstNode(sv, "next")))));
138                                            return new ListNode(sv, "",
139                                                new ListNode(sv, "",
140                                                    new ListNode(sv, "",
141                                                        iter
142                                                    ),
143                                                    varinit
144                                                ),
145                                                new WhileNode(sv, "while",
146                                                    hasnext,
147                                                    new ListNode(sv, "",
148                                                        new ListNode(sv, "",
149                                                            vargennext
150                                                        ),
151                                                        s)));

回答您的问题:是的,我为这段代码感到羞耻.

To answer your questions: yes, I am ashamed of this code.

问题:是否有解析器生成器让我们对它进行一些处理,即给定规则:

QUESTION: Is there are parser generator that let's me do something about it, namely given rule:

FOR LPAREN var_decl COLON expr RPAREN statement

告诉解析器重写它,就好像它是别的东西一样.我认为这将需要某种LISP的宏机制(由于基本上根本没有语法,因此很容易在lisp中使用),也许与此类似:

tell parser to rewrite it as if it was something else. I imagine that this would require some kind of LISP's macro mechanism (which is easy in lisp due to basically lack of grammar whatsoever), maybe similar to this:

FOR LPAREN var_decl COLON expr RPAREN statement =
    {   with [ x = generateAutoName(); ]
        emit [ "Iterator $x = makeIterable($expr).iterator();"
               "while (${x}.hasNext()) {"
               "$var_decl = ${x}.next();"
               "$statement"
               "}"
        ]
    }

我不知道这是否是一个众所周知的问题,我什至不知道要寻找什么-我发现的最相似的问题是这个问题:

I don't know if this is a well known problem or not, I simply don't even know what to look for - the most similar question that I found is this one: Any software for pattern-matching and -rewriting source code? but it isn't anywhere close to what I need, since it is supposed to work as a separate step and not during compilation, so it doesn't qualify.

任何帮助将不胜感激.

推荐答案

我认为您正试图过度弯曲解析器.您可以简单地构建 里面有宏的树,然后对树进行后处理,以用所需的任何替换来替换宏.

I think you are trying to bend the parser too much. You can simply build the tree with macro in it, and then post-process the tree to replace the macros with whatever substitution you want.

您可以通过遍历结果树,检测宏节点(或您要进行替换的地方),以及简单地通过程序树黑客进行替换来实现此目的.不漂亮,但可行.您应该能够使用任何解析器生成器/AST建筑机械的结果来做到这一点.

You can do this by walking the resulting tree, detecting the macro nodes (or places where you want to do substitutions), and simply splicing in replacements with procedural tree hackery. Not pretty but workable. You should able to do this with the result of any parser generator/AST building machinery.

如果您想要一种更结构化的方法,则可以构建AST,然后使用源到源的转换将宏重写"为它们的内容. 出 DMS软件再造工具包可以做到这一点,您可以阅读更多详细信息 关于转换的外观.

If you want a more structured approach, you could build your AST and then use source-to-source transformations to "rewrite" the macros to their content. Out DMS Software Reengineering Toolkit can do this, you can read more details about what the transforms look like.

使用DMS方法,您的概念:

Using the DMS approach, your concept:

expr POST_INCR -> "(tmp = expr; expr = expr + 1; tmp)"

要求您以常规方式解析原始文本 语法规则:

requires that you parse the original text in the conventional way with the grammar rule:

 term = expr POST_INCR ;

您可以将所有这些语法规则提供给DMS,然后让它 解析源并根据语法构建AST.

You would give all these grammar rules to DMS and let it parse the source and build your AST according to the grammar.

然后将DMS重写应用于结果树:

Then you apply a DMS rewrite to the resulting tree:

domain MyToyLanguage; -- tells DMS to use your grammar to process the rule below

rule replace_POST_INCR(e: expr): term->term
  = "\e POST_INCR" -> " (tmp = \e; \e = \e + 1; tmp) ";

此处的引号是域元引号",而不是字符串文字引号. 双引号外的文本是DMS规则语法.引号内的文本是您的语言(MyToyLangauge)的语法,并使用您提供的解析器进行解析,其中某些特殊的转义符用于\ e之类的模式变量. (您无需对语法进行任何操作即可获得这种模式分析功能; DMS会为此提供帮助.)

The quote marks here are "domain meta quotes" rather than string literal quotes. The text outside the double-quotes is DMS rule-syntax. The text inside the quotes is syntax from your language (MyToyLangauge), and is parsed using the parser you provided, some special escapes for pattern variables like \e. (You don't have to do anything to your grammar to get this pattern-parsing capability; DMS takes care of that).

根据与DMS的约定,我们通常将文字标记命名为POST_INCR 在词法分析器中使用带引号的等效"++",而不是使用这样的名称. 代替

By convention with DMS, we often name literal tokens like POST_INCR with a quoted equivalent '++' in the lexer, rather than using such a name. Instead of

#token POST_INCR "\+\+"

lexer规则如下:

the lexer rule then looks like:

#token '++'  "\+\+"

如果这样做,则语法规则如下:

If you do that, then your grammar rule reads like:

term = expr '++' ;

现在您的重写规则如下:

and your rewrite rule now looks like:

rule replace_POST_INCR(e: expr): term->term
  = "\e++" -> " (tmp = \e; \e = \e + 1; tmp) ";

遵循此约定,语法(词法分析器和BNF) (IMHO)更具可读性, 并且重写规则也更具可读性,因为它们保持不变 非常接近实际的语言语法.

Following this convention, the grammar (lexer and BNF) is (IMHO) a lot more readable, and the rewrite rules are more readable too, since they stay extremely close to the actual language syntax.

这篇关于源代码解析和类似结构的类宏处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆