在编译器实现瓶盖 [英] Implementing Closures in a Compiler

查看:135
本文介绍了在编译器实现瓶盖的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图设计一个基本的编译器伪组装code。但是,我无法弄清楚如何实施封锁。看来我需要特定的寄存器值,每个子程序相关联。我考虑过使用栈,但再次似乎不足。这似乎是不折不扣的关联数组会的工作,但怎么可能,或者类似的东西,在组装完成?

I am attempting to design a basic compiler to pseudo-assembly code. However, I cannot figure out how to implement closures. It seems I would need to associate specific register values with each "subroutine". I've considered use of stacks, but once again it seems insufficient. It seems like nothing short of an associative array would work, but how could that, or something similar, be done in assembly?

我选择的例子来试图重新present如下,作为沟通的CoffeeScript的简洁。

The example I have chosen to attempt to represent is the following, communicated as CoffeeScript for conciseness.

((x) -> (y) -> x(y))((x) -> x)(2)

下面是我一直在努力的总体结构。这是伪组件的样品到我编译

Here is the general structure I have been trying. This is a sample of the pseudo-assembly to which I am compiling.

'((label lam1)   ;; (x) -> x
  (cp resp arg)
  (ret)

  (label lam2)   ;; (y) -> x(y)
  (jmp x)

  (label lam3)   ;; (x) -> [(y) -> ...]
  (cp x arg)     ;; this is the assignment intended for a closure
  (cp resp lam2) ;; this is a returned lambda, needing the closure
  (ret)

  (label main)
  (cp arg lam1)
  (call lam3)
  (set arg 2)
  (call resp))) 

本作品;然而,该值就是名字下集X ,然后一个lambda返回时, X 值可以很容易被污染前拉姆达的执行。

This works; however, the value is simply set under the name x and then a lambda returned, the x value could be easily polluted prior to execution of the lambda.

在结构和计算机程序的国米pretation实现的描述是下面这似乎是可行的,以我在议会没有。我不知道他们会使用什么战术等。

The description of implementation in Structure and Interpretation of Computer Programs was the following, which does not seem doable to me in assembly. I don't know what other tactics they could be using.

的程序对象将在运行时由当前环境(在定义的点的环境中)与该入口点组合以已编译的程序(一个新生成的标签)来构造。

The procedure object will be constructed at run time by combining the current environment (the environment at the point of definition) with the entry point to the compiled procedure (a newly generated label).

总之,如何能寄存器值可以用子程序有关?可以堆栈是否足够呢?

推荐答案

堆叠可以是不够的...考虑一个简单的情况下,他们做的。

Stacks cannot be sufficient... consider a simpler case where they do

function bar(f) {
    alert(f());
}

function foo(x) {
    bar(function(){ return x; });
}

foo(42);

在上述情况下,它在理论上是可能的 X 在封闭住的堆栈帧因为封闭不会活得比它的创造者
然而,随着一个小的变化:

In the above case it would be theoretically possible for the x in the closure to live in the stack frame of foo because the closure is not going to outlive its creator foo. However with a small change:

function bar(f) {
    to_call_later.push(f);
}

关闭将被存储起来,并时有可能叫做终止已经和它的活动记录的栈空间已经被回收。显然 X 不能在该堆栈区,因为它必须生存下去。

the closure will be stored away and will be potentially called when foo already terminated and the stack space for its activation record has been reclaimed. Clearly x cannot be in that stack area because it must survive.

因此​​,存在两个问题:

Therefore there are two problems:


  1. 封闭必须有一些存储(环境)。当你认为调用两次传递两个不同的值应为 X 两个独立的存储器,这是显而易见的。如果关闭这仅仅是code,那么这是不可能的,除非不同code正想每次调用时间来生成。

  1. a closure must have some storage (environment). This is obvious when you think that calling foo twice passing two different values should create two independent storages for x. If the closure was just the code then this is not possible unless different code was going to be generated each time you call foo.

此存储必须住的至少只要关闭本身,而不是仅仅作为谁创造的关闭。

this storage must live at least as long as the closure itself, not only as who creates the closure.

还要注意的是,如果你想拥有读/写封闭了变量,你需要额外的间接级别,例如:

Note also that if you want to have read/write closed-over variables you need an extra level of indirection, for example:

function bar(f) {
    alert(f());
}

function foo(x) {
    var c1 = function() { return ++x; };
    var c2 = function() { return x *= 2; };
    bar(c1);
    bar(c2);
}

foo(42);  // displays 42+1=43 and 43*2=86 (not 42*2=84!)

换句话说,你可以有几个不同的闭包共享的相同环境。

所以 X 不能在激活记录的堆栈,它不能在封闭对象本身。封闭对象必须有一个指针 X 是活的。

So x cannot be in the stack of foo activation record and it cannot be in the closure object itself. The closure object must have a pointer to where x is living.

一个可能的解决方案来实现这个上说是86:

A possible solution to implement this on say x86 is:


  • 使用收集垃圾或引用计数的内存管理系统。栈是远远不足以处理关闭。

  • Use a garbage collected or reference-counted memory management system. Stacks are by far insufficient to handle closures.

每个闭包是有两个字段的对象:一个指向code和指针数组来封闭在变量(环境)

Each closure is an object with two fields: a pointer to code and an array of pointers to closed-over variables (the "environment").

在执行code你有一个堆栈尤其键,例如 ESI 所指向的关闭对象本身(所以(ESI)为code的地址,(ESI + 4)首先封闭了变量,的地址(ESI + 8)是第二个地址封闭在变量等)。

When executing code you have a stack esp and e.g. esi is pointing to the closure object itself (so (esi) is the address of the code, (esi+4) is the address of first closed-over variable, (esi+8) is the address of second closed-over variable and so on).

每个变量是一个独立的堆分配对象,可以为有闭包仍然指向它作为长期生存。

Each variable is an independent heap-allocated object that can survive as long as there are still closures pointing to it.

这当然是一种非常原始的方法。例如SBCL是聪明得多,并且没有捕获变量堆栈和/或仅分配寄存器。这需要做的闭包是如何使用的分析。

This is of course a very crude approach. For example SBCL is much smarter and variables that are not captured are allocated on stack and/or registers only. This requires doing an analysis of how a closure is used.

假如你只考虑单纯的功能设置(换句话说,函数/闭包的返回值仅在传递的参数取决于与关闭状态不能突变),那么事情可以简化一点。

Supposing you're only considering a purely functional setting (in other words the return value of a function/closure depends only on the passed parameter and the closure state cannot mutate) then things can be simplified a little.

您可以做的是做,而不是拍摄的变量包含捕获的关闭对象,并通过在同一时间关闭本身能够复制的对象,然后就堆栈在理论上可以使用(除了有一个封闭的大小可以根据需要多大的状态,以捕捉变化问题),所以这并不容易,至少对我来说,想象一个合理的堆栈只是基于参数协议传递和返回值,在这种情况下。

What you can do is making the closure object containing the captured values instead of the captured variables and by making at the same time the closure itself a copyable object then just a stack can in theory be used (except that there is the problem that a closure can vary in size depending on how much state needs to capture), so it's not easy at least for me to imagine a reasonable stack-only based protocol for parameter passing and value returning in this case.

通过关闭一个固定大小的对象,你可以看到这个C程序如何实现只用栈关闭拆除可变大小的问题(注意有没有的malloc 话费)

Removing the variable size problem by making the closure a fixed-size object you can see how this C program can implement closures using only stack (note that there are no malloc calls)

#include <stdio.h>

typedef struct TClosure {
    int (*code)(struct TClosure *env, int);
    int state;
} Closure;

int call(Closure *c, int x) {
    return c->code(c, x);
}

int adder_code(Closure *env, int x) {
    return env->state + x;
}

int multiplier_code(Closure *env, int x) {
    return env->state * x;
}

Closure make_closure(int op, int k) {
    Closure c;
    c.state = k;
    c.code = (op == '+' ? adder_code : multiplier_code);
    return c;
}

int main(int argc, const char *argv[]) {
    Closure c1 = make_closure('+', 10);
    Closure c2 = make_closure('*', 3);
    printf("c1(3) = %i, c2(3) = %i\n",
           call(&c1, 3), call(&c2, 3));
    return 0;
}

关闭结构可以通过,返回并存储在堆栈,因为环境是只读的,所以你不必寿命问题,因为不可变的数据可以不被复制影响语义。

Closure structs can be passed, returned and stored on stack because the environment is read-only so you don't have the lifetime problem because immutable data can be copied without affecting semantic.

C编译器可以使用这种方法来创建闭包,只能通过值捕获的变量,的确是C ++ 11的lambda提供(您可以通过引用也捕捉到,但它是由程序员,以确保寿命捕获的变量持续足够)。

A C compiler could use such an approach to create closures that can only capture variables by value, and indeed is what C++11 lambda provide (you can capture also by reference, but it's up to the programmer to ensure that the lifetime of captured variables lasts enough).

这篇关于在编译器实现瓶盖的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆