形式方法在编译器中进行语义分析 [英] formal method to do semantic analysis in compiler

查看:169
本文介绍了形式方法在编译器中进行语义分析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道有一种称为属性语法的形式主义和一种称为语法定向翻译的非形式主义方法,但第一种方法效率低下,后一种方法难以自动化。



有没有其他最近的语义分析形式主义

解决方案

OP建议属性语法定向翻译难以自动化。



我们的

/www.semanticdesigns.com/Products/DMS/DMSToolkit.htmlrel =nofollow> DMS软件重组工具包支持这两种活动及更多。



它提供用于完整上下文无关语法的解析器,以及用任意数据和操作来定义,编译和执行属性语法的能力,以及跨语法节点的任意流。可以计算度量,构建符号表,或使用这种属性语法计算语义分析。



给定(DMS)语法规则:

  LHS = RHS1 ... RHSN; 

一个为命名属性语法计算写入DMS属性语法规则 Pass1 (出于实际原因,可以有许多不同的传递,有些甚至建立彼此的结果):

 < ; Pass1>> ;: {LHS.propertyI = fn1(RHSx.propertyY,...); 
...
RHSa.propertyB = fn2(RHSp.propertyQ,...);对于一组(任意类型)属性,
...
}



<使用在DMS的底层(并行)语言PARLANSE中实现的涉及的类型上定义的任意函数fnI,在语法规则的左侧或右侧上与每个语法元素相关联。 DMS计算跨越该组规则的数据流,并确定实现计算的部分阶(并行)计算,并将其编译成PARLANSE代码以供执行。属性计算的结果是用计算的属性装饰的树。



小心,一次应该能够定义一个指称语义。 DS中的一个关键概念是环境,它将标识符映射到类型和可能的符号值。 (前者传统上称为符号表)。在引入新范围的AST节点处,可以通过将父环境与新引入的标识符相组合来将创建新环境的属性函数写入,并将其从AST节点传递到其子节点,例如规则

  exp ='let'ID'='exp1'in'exp2; 

可能会编写属性语法规则:

 << Denotation>> ;: {
exp2.env = augment_environment(exp.env,
new_variable_and_value_pair(name。ID。),
exp1.value));
exp.value = exp2.value;
}

我不知道OP是什么意思is)inefficient。我们使用DMS属性语法来计算C ++ 14的所有的语义属性(名称和类型解析)。
虽然这样的定义是大多数学术论文标准的巨大,这是因为C ++ 14本身是巨大的和令人惊讶的混乱(委员会的骆驼)。尽管如此,我们的属性语法似乎运行得很好。更重要的是,它对于一个非常小的团队来说是足够强大的(与团队支持Clang的规模形成对比)。



DMS还提供了使用源和目标(如果不同于源)语言的表面语法编码源到源变换(rewrites),形式为如果您看到 em> 。这些重写被应用于解析树以提供修改的树; DMS提供的prettyprinter(反解析器)然后可以重新生成目标语言的源代码。如果一个人限制自己重写,正确地平铺原来的AST,得到语法导向的翻译。 OP可能声称(语法导向翻译)难以自动化;我同意,但是工作已经完成并且可用。 OP不得不决定她想要定义和执行的规则。



DMS重写规则采取形式:

 规则rule_name(parameter1:syntax_category1,... parameterN ...)
:source_syntax_category - > target_syntax_category
<源语言中的文本>
- >
<目标语言的文本>
if condition_of_matched_source_pattern;

其中参数是语法类型子树的占位符,规则映射
a类型树source_syntax_category - > target_syntax_category(通常是同一个),
和...是用引号括起来的表达式语法的元引号,用\标记嵌入的转义。元引用的代码片段被解释为树的规范(使用读取源代码的相同解析引擎);这不是字符串匹配。例如:

 规则simplify_if_then_else(c:condition,t:then_clause,e:else_clause)
statement->语句
=if \c then \t else \e
- > \t
if c ==true;

更多语义的(上面纯语法)检查的概括将是



  ... 
如果can_determine_is_true(c);

它假设自定义谓词参考其他DMS可派生结果来决定实例化条件在点(其中匹配的树c携带它的源位置,因此上下文被隐含)。可以为期望的语言构建控制和数据流,并且使用结果数据流来确定到达条件c的值,然后总是以非平凡的方式证明是真实的。



我假设了一个DMS定义的支持谓词can_determine_if_true。这只是一点自定义的PARLANSE代码。



然而,由于重写将一个树转换成另一个树,可以重复地应用任意长/复杂的一组转换规则整个树。这给予DMS重写Post(string [generalized to tree])重写系统的能力,因此具有图灵能力。您可以在技术上使用足够的变换产生原始树的任意变换。通常使用DMS的其他特性使写变换有点容易;例如,重写规则可以查阅特定属性语法计算的结果,以便容易地使用来自远在树中的信息(个体重写规则总是具有固定的,最大的半径)。



DMS提供了大量额外的支持机制,以帮助构建控制流图和/或使用高效并行求解器计算数据流。 DMS还有各种可用的前端用于各种语言,如C,C ++ 14,Java1.8,IBM Enterprise COBOL,...可用,以便工具工程师可以专注于构建她想要的工具,而不是争取从头开始构建解析器(只是为了发现必须存在解析后的生活)。



如果OP最近对另一种风格(结构化操作)语义的概述感兴趣,他可以参阅
编程语言的语义



我们声称这些文件中的技术可以在DMS上实现。



想法。他们大多是研究工具而不成熟。一个这样的研究系统, JastAdd 是一个属性语法评估系统,我听说它在能力和性能上脱颖而出,但我没有具体的经验。


I know there is a formalism called attribute grammar,and a non-formalism method called syntax-directed translation,but the first is inefficient and the latter one is difficult to automate.

Does there exist other recent formalism about semantic analysis?

解决方案

OP suggests "attribute grammars" are inefficient, and syntax-directed translation is difficult to automate. I offer a proof-point showing otherwise, name a few other semantic systems, and suggest how they might be integrated, below.

Our DMS Software Reengineering Toolkit supports both of these activities and more.

It provides parsers for full context free grammars, and the ability to define, compile, and executed in parallel attribute grammars with arbitrary data and operations, and arbitrary flow across the syntax nodes. One can compute metrics, build symbol tables, or compute a semantic analysis with such attribute grammars.

Given a (DMS) grammar rule:

  LHS = RHS1 ... RHSN ;

one writes a DMS attribute grammar rule for a named attribute grammar computation Pass1 (for practical reasons, there can be many different passes, some even building one another's results) in the form:

  <<Pass1>>:  {  LHS.propertyI=fn1(RHSx.propertyY,...);
                 ...
                 RHSa.propertyB=fn2(RHSp.propertyQ,...);
                 ...
              }

for a set of (arbitrary type) properties associated with each grammar element, either on the left or right hand side of the grammar rule, using arbitrary functions fnI defined over the types involved, implemented in the DMS's underlying (parallel) language, PARLANSE. DMS computes the dataflows across the the set of rules, and determines a partial order (parallel) computation that achieves the computation, and compiles this into PARLANSE code for execution. The result of an attribute computation is a tree decorated with the computed properties.

With care, once should be able to define a denotational semantics of a language computed by an attribute grammar. One of the key notions in DS is that of an "environment", which maps identifiers to types and possibly symbolic values. (The former is traditionally called a symbol table). At AST nodes that introduce new scopes, one would write an attribute function that created an new environment by combining the parent environment with newly introduced identifiers, and pass that down from the AST node to its children, e.g., for the rule

exp = 'let' ID '=' exp1 'in' exp2;

one might code an attribute grammar rule:

<<Denotation>>: {
     exp2.env = augment_environment(exp.env,
                                    new_variable_and_value_pair(name(ID.),
                                                                exp1.value));
     exp.value=exp2.value;
               }

I'm not sure what the OP means by (attribute grammars are) "inefficient". We've used DMS attribute grammars to compute semantic properties (name and type resolution) for all of C++14. While such a definition is huge by most academic paper standards, it is that way because C++14 itself is huge and an astonishing mess ("camel by committee"). In spite of this, our attribute grammar seems to run well enough. More importantly, it is powerful enough for a very small team to build it (in contrast to the scale of "team" supporting Clang).

DMS also provides the ability to encode source-to-source transformations ("rewrites") using the surface syntax of the source and target (if different than source) languages, of the form, "if you see this, replace it by that". These rewrites are applied to the parse trees to provide revised trees; a prettyprinter ("anti-parser") provided by DMS can then regenerate source code for the target language. If one limits oneself to rewrites that exactly tile the original AST one gets "syntax-directed translation". OP might claim this (syntax directed translation) is difficult to automate; I'd agree but the work is done and available. OP does have to decide what rules she wants to define and execute.

DMS rewrite rules take the form:

 rule rule_name(parameter1:syntax_category1, ... parameterN...)
   :  source_syntax_category -> target_syntax_category
   "  <text in source language>  "
  ->
   "  <text in target language> "
  if  condition_of_matched_source_pattern;

where the parameters are placeholders for syntax-typed subtrees, the rule maps a tree of type source_syntax_category -> target_syntax_category (often the same one), and the "..." are meta-quotes wrapped around surface syntax with "\"-labelled embedded escapes for the parameters where needed. The meta-quoted code fragments are interpreted as specifications for trees (using the same parsing engine that reads the source code); this is not a string-match. An example:

  rule simplify_if_then_else(c:condition,t:then_clause,e:else_clause)
     statement->statement
  =  " if \c then \t else \e "
  -> " \t "
  if c == "true";

A generalization of the (above purely syntactic) check) which is more "semantic" would be

  ...
  if can_determine_is_true(c);

which assumes custom predicate that consults other DMS-derivable results to decide the instantiated condition is always true at the point where it is found (the matched tree c carries its source position with it, so the context is implied). One might build control and data flow for the desired language, and use the resulting dataflow to determine values that arrive at the condition c, which may then always turn out to be "true" in a nontrivial way.

I have assumed a DMS-defined support predicate "can_determine_if_true". This just a bit of custom PARLANSE code.

However, since the rewrites transform one tree into another tree, one can apply an arbitrarily long/complex set of transformation rules repeatedly to the entire tree. This gives DMS rewrites the power of a Post (string [generalized to tree]) rewriting system, thus Turing capable. You can technically produce any arbitrary transformation of the original tree with sufficient transforms. Usually one uses other features of DMS to make writing the transforms a bit easier; for instance, a rewrite rule may consult the result of a particular attribute grammar computation in order to "easily" use information from "far away in the tree" (individual rewrites rules always have a fixed, maximum "radius").

DMS provides a lot of additional support machinery, to help one construct control flow graphs and/or compute dataflow with efficient parallel solvers. DMS also has a wide variety of available front ends for various langauges such as C, C++14, Java1.8, IBM Enterprise COBOL, ... available so that a tool engineer can concentrate on building the tool she wants, rather than fighting to build a parser from scratch (only to discover that one must live Life After Parsing).

If OP is interested in an recent overview of another style of (structured operational) semantics, he might consult course notes for Semantics of Programming Languages. We claim the techniques in such papers can be implemented on top of DMS if one likes.

One can make a long list of various academic tools that implement (some) of these ideas. Most of them are research tools and not mature. One such research system, JastAdd is an attribute grammar evaluation system, and I hear that it stands out in capability and performance, but I have no specific experience with it.

这篇关于形式方法在编译器中进行语义分析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆