以编程方式重命名函数 [英] Programmatically rename functions

查看:124
本文介绍了以编程方式重命名函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在编写一个ECMAScipt5编译器,它在解析树上执行各种给定的优化/转换,并编译回ECMAScipt5。



一项功能是重命名 EnvironmentRecord 中的绑定



这种转换可以自动执行,例如作为旨在减少代码大小的优化的一部分,其中每个变量(不在全局范围内)将被赋予下一个最短的可用名称,或者在引入新范围的语句之后通过注释手动。



但是,我必须将(自动)过程限制为变量声明。



考虑这两个例子。第一个,编译,指定 [Minify] 作为转换,第二个使用 [Directives,PrettyPrint]



语法: Compiler.fromSource(src).compile([/ *转换数组* /]);

  var bar,foo; 
(函数exampleMin(){
var bar =foo,
foo =bar;

函数fooBar(){
return foo + bar;
}
})

编译为



var bar,foo; function exampleMin(){var A =foo,B =bar; function fooBar(){return B + A}}



  var bar, FOO; 
(函数exampleMin(){
@Rename bar:A
@Rename foo:B
@Rename fooBar:C
var bar =foo,
foo =bar;

function fooBar(){
return foo + bar;
}
})

编译为

  var bar,foo ; 
函数exampleMin(){
var A =foo,B =bar;
函数C(){
返回B + A;
}
};

导致问题部分,功能...考虑以下

  if(fooBar.name ==='fooBar'){
// ...
}

现在,如果此语句包含在 exampleMin 中。用户定义的重命名会将代码转换为语义上不同的代码。通过自动执行的转换不会发生这种情况。



虽然我盲目地假设用户定义的函数重命名不会以某种方式改变语义,但我想生成一个警告如果可能的话。但我不知道如何确定以编程方式重命名函数是否安全



这让我想到了这些问题:


  1. 什么,除了访问a重命名函数时必须考虑函数名吗?

  2. 必须执行哪些分析才能将函数标记为可安全优化或不安全。是否有可能。

  3. 我是否愿意将函数排除在重命名之外,或者我是否会尝试更改其他方面。与函数名称的比较。 (如果可以证明它没有任何副作用)

  4. 在这种特定情况下,语义的变化是否可以容忍(CH:2)
    [2] 静态单一作业簿
    [4] < a href =http://www.ics.uci.edu/~lopes/teaching/inf212W12/readings/rep-analysis-soft.pdf =nofollow noreferrer>软件的表示和分析






    作为评论中提到的 @didierc ,使用括号进行属性访问时出现同样的问题符号。因此,对象环境记录 Bindings 只能手动重命名。

    解决方案

    假设你不能像@Phil H那样破解解释器,如果可能的话,这是一个有效的解决方案...



    编译器处理案例的正常方式像这样被称为数据流分析。这相当于以某种方式描述程序结构的代码行走计算值。在所有有趣的情况下,这些值都是对流估计。



    最流行的保守假设是,如果分支将执行,循环迭代的次数也是未知的,并且没有任何关于函数调用可能在副作用方面做什么的知道。复杂的过程间数据流分析允许丢弃最后一个。例如。 LLVM能够做到这一点,但许多其他编译系统都没有。



    这个问题的特殊保守假设是,如果 .name 或类似的内省用于一个值,然后它被污染。它无法重命名。通过数据流分析,您可以找到受保护的受污染名称列表。



    对于此问题,可能最适用的数据流分析是使用分析来构建使用链附在每个定义上。在javascript中,定义等同于assingment和函数命名。如果您的分析足够复杂以查看内部哈希值,那么添加键值对就是def。



    分析结果将是附加到每个定义的列表使用该定义中确定的值。保守的假设意味着该列表可能包括从未实际发生的用途。



    关于数据流分析的文献很多。但是开始学习它的一个很好的方法是旧的标准龙书,Aho Sethi和Ullman,编译器设计



    添加



    非常动态的语言存在的问题是准确的数据流分析很难。评论中的示例:

      var n ='name'; 
    function foo(){};
    if(foo [n] ==='foo')doSth()

    is典型。愚蠢的保守假设是,在任何索引操作 a [s] 中, s 可能 '名称'。所以 a 无法重命名。您必须通过使数据流分析更加详细 - 而不是估计值来克服愚蠢假设的限制。



    这方面的框架是 抽象解释 :实际运行程序时,抽象值域替换实际数据,再次对进行保守假设,如果 s和循环。如果抽象域具有某些属性,则保证有限长度执行以计算每个变量的最终值。常量折叠是一种简单的抽象解释形式。



    在您的示例中,抽象值域必须包含至少类似


    $的内容b $ b

      {unassigned,constant string,unknown} 

    在实践中,您还需要 null ,常数,函数和其他值。抽象解释术语会说底部而不是未分配和顶部而不是未知。



    所有这些都是旧的东西。因此,从60年代开始就有大量的正式文献,当时人们对优化lisp编译器非常感兴趣。如果您可以访问,请搜索ACM档案。


    I am currently writing an ECMAScipt5 compiler that performs various given optimizations/transformations on a parse tree and compiles back to ECMAScipt5.

    One functionality is to rename a Binding in an EnvironmentRecord.

    This transformation may either be performed automatically e.g. as part of an optimization that aims to reduce code size, where each variable (not in the global scope) will be given the next shortest available name, or manually by an annotation after a statement that introduces a new scope.

    However, I have to restrict the (automatic) process to variable declarations only.

    Consider those two examples. The first, compiled, specifying [Minify] as transformations, the second one using [Directives, PrettyPrint]

    Syntax: Compiler.fromSource (src).compile ([/*Array of transformations*/]);

    var bar,foo;
    (function exampleMin () {
        var bar="foo",
            foo="bar";
    
        function fooBar () {
            return foo + bar;
        }
    })
    

    compiles to

    var bar,foo;function exampleMin(){var A="foo",B="bar";function fooBar(){return B+A}}

    And

    var bar,foo;
    (function exampleMin () {
        @Rename bar:A
        @Rename foo:B
        @Rename fooBar:C
        var bar="foo",
            foo="bar";
    
        function fooBar () {
            return foo + bar;
        }
    })
    

    compiles to

    var bar,foo;
    function exampleMin(){
         var A="foo",B="bar";
         function C(){
              return B+A;
         }
    };
    

    Which leads to the problematic part, functions... consider the following

    if (fooBar.name === 'fooBar') {
     //...
    }
    

    Now, if this statement would be contained in exampleMin. The user defined rename would have transformed the code into a semantically different code. Which must not ever happen by an automatic performed transformation.

    While I blindly assume that user defined renaming of functions doesn't change the semantics somehow, I would like to produce a warning if that may be the case. But I don't know how to determine whether it's safe to rename a function programmatically or not.

    This brings me down to the questions:

    1. What, besides accessing a functions name has to be considered when renaming a function?
    2. What analysis has to be performed to mark a function as either safely optimizable or not. Is it possible at all.
    3. Would I rather exclude functions from being renamed or would I try to change the other side of e.g. a comparison against a functions name too. (If it can be proved to have no side effects either)
    4. Would a change in the semantics be tolerable in such a specific case (GCC seems to think so), if I, in exchange, offer a @do-not-optimize annotation?


    Update 1.

    I have come to the conclusion that this analysis might be not possible solely through static analysis

    Consider the following code.

    function foo () {}
    function bar () {}
    var fns = [bar,foo];
    
    if (fns [0].name === 'bar') fns [0] ();
    
    fns.unshift (foo);
    
    if (fns [1].name === 'bar') fns [1] ();
    

    I can't imagine how to track the references back to it's origin once a function has been added to an array, without executing the code. Maybe i would need some form of Abstract Interpretation1?


    Update2

    In the mean-time and after reading @Genes answer, I realized there are other few things that may not hurt to be added. First, some side notes:

    • Apparently I am not writing a compiler, but rather a preprocessor since it outputs sourcecode and not machine code.
    • Given that there would only be static accesses of bound Identifiers, I have a good idea on how to approach the problem.
    • Each Binding in every environment record currently holds a list to all its static references (I obviously couldn't add dynamic ones)

    I am currently working on the SSA[2] conversion. So I haven't yet implemented any DataFlow analysis yet. But that's on the plan.

    So for the sake of simplicity, let's just assume the following prerequisites would be met.

    • AST and CFG are in Static Single Assignment form.
    • GEN and KILL sets have been computed for each node in the CFG4
    • Reaching Definitions4 / IN and OUT sets have been computed.
    • DEF / USE pairs have been computed
    • flow dependence edges have been added to the CFG

      So the Control Flow Graph(s) for the first example could look something like this.

    • The black, non-dotted lines represent control flow edges.
    • The black dotted connections represent dataflow dependencies
    • The blue double arrow lines represent call sites.
    • The blue dotted lines represent interprocedural dependencies. I'm however not sure if i should make a direct connection between the corresponding nodes of each precedures CFG

    Given this, I could know simply perform the following.

    For each function that is about to be renamed:

    • Visit its declarations CFG node
    • For each flow dependency edge visit the target node
    • If that node is a conditional goto statement and the functions reference is the LHS of a property accessor with the RHS being "name".
      • Mark the function as tainted

    The only problem is I can't see how to compute (even approximate) that information for non-static references of a function.

    Soo, if that analysis doesn't help finding ALL references to a function, i could as well use the beforementioned list of references, that each Binding in an environment record holds. Since a function has a declarative environment record as well as an object environment record. I could simply take a look at the count of references of its object environments "name" Binding.

    As a reference, here is the actual code that currently performs the renaming

    var boundIdentifiers = this.environment.record.bindings, //`this` refers to an AST node representing a FunctionDeclaration or a FunctionExpression
        nextName, 
        identifier, 
        binding;
    
    for (identifier in boundIdentifiers) {
        binding = boundIdentifiers [identifier];
        if (binding.uses < 2 && !binding.FunctionExpression) {
            compiler.pushWarning (binding.references [0].line, binding.references [0].column,'Declared function ' + identifier + ' is never called.') //False positive if the functions reference is obtained dynamically
        }
    
        if (boundIdentifiers [identifier].FunctionDeclaration || boundIdentifiers [identifier].FunctionExpression) {
            continue; //Skip function declarations and expressions, since their name property could be accessed
        }
    
        do {
            nextName = nextVar (); 
        } while (
            Object.hasOwnProperty.call (boundIdentifiers,nextVar) //There could exist a `hasOwnProperty` binding.
        ); //ther could a with the name that already exists in the scope. So make sure we have assign a free name.
    
        this.environment.record.setBindingName (identifier, nextName);
    }
    

    So the overall problem boils down to catching non-static references

    What analysis techniques and prior optimizations would need to be involved to catch at least some (since its not possible to catch em all), non-static references.


    I modified the questions to fit the update. So, the above questions still apply

    [1]A Survey of Static Program Analysis Techniques (CH: 2) [2]Static Single Assignment Book [4]Representation and Analysis of Software


    As @didierc mentioned in the comments, the same problem arises for property accesses using the bracket notation. So Bindings of an object environment record can be renamed manually only.

    解决方案

    Assuming you can't "break" the interpreter as @Phil H says, which is a valid solution if possible...

    The normal way that compilers handle cases like this is called dataflow analysis. This amounts to a code walk computing values that describe the program structure in some ways. In all interesting cases, the values are converative estimates.

    The most prevalent conservative assumptions are that nothing is known about which if branch will execute, that the number of times a loop will iterate is also unknown, and that nothing is known about what a function call might do in terms of side effects. Sophisticated kinds of interprocedural dataflow analysis allow dropping the last one. E.g. LLVM is capable of this, but many other compilation systems are not.

    The special conservative assumption for this problem is that if .name or similar introspection is used on a value, then it's "tainted." It can't be renamed. Dataflow analysis will let you find a conservative list of tainted namees.

    For this problem, the dataflow analysis that probably applies best is def-use analysis to construct "use chains" attached to each "definition". In javascript, definition is equivalent to assingment and function naming. If your analysis is sophisticated enough to look inside hashes, then adding a key-value pair is a def.

    The analysis result will be a list attached to each definition of each "use" of the value established during that definition. The conservative assumptions mean that the list may include uses that never actually occur.

    There is a huge literature on dataflow analysis. But an excellent way to start learning about it is the old standard "dragon book", Aho Sethi and Ullman, Compiler Design.

    Addition

    The problem with very dynamic languages is that accurate dataflow analysis is hard. The example in comments:

    var n='name';
    function foo () {}; 
    if (foo[n] === 'foo') doSth ()
    

    is typical. The dumb conservative assumption is that in any indexing operation a[s], the s might be 'name'. So a can't be renamed. You must overcome the limits of dumb assumptions by making the dataflow analysis more detailed - less of an estimate.

    A framework for this is abstract interpretation: actually running the program with an abstract value domain replacing real data, again making conservative assumptions about ifs and loops. If the abstract domain has certain properties, a finite length execution is guarenteed to compute final values for each variable. Constant folding is a simple form of abstract interpretation.

    In your example, the abstract value domain would have to include at least something like

    { unassigned, constant string, unknown }
    

    In practice you'll also want null, constant numbers, functions, and other values. The abstract interpretation lingo would say "bottom" instead of "unassigned" and "top" instead of "unknown."

    All this is old stuff. Therefore there is an enormous formal literature on it starting in the 60's, when people were very interested in optimizing lisp compilers. If you can get access, search the ACM archives.

    这篇关于以编程方式重命名函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆