在哪里/如何声明在Ocaml编写的编译器中的变量的唯一键? [英] Where/how to declare the unique key of variables in a compiler written in Ocaml?

查看:114
本文介绍了在哪里/如何声明在Ocaml编写的编译器中的变量的唯一键?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Ocaml中编写一个mini-pascal的编译器。我想我的编译器接受以下代码例如:

 程序测试; 
var
a,b:boolean;
n:integer;
begin
...
end。

在处理变量声明时遇到困难( var )。目前,变量类型在 sib_syntax.ml 中的定义如下:

  type s_var = 
{s_var_name:string;
s_var_type:s_type;
s_var_uniqueId:s_uniqueId(* key *)}

其中 s_var_uniqueId (而不是 s_var_name )是变量的唯一键。我的第一个问题是,在哪里和如何实现生成一个新的id的机制(实际上通过增加最大的id为1)每次我有一个新的变量。我想知道是否应该在 sib_parser.mly 中实现它,这可能涉及一个静态变量 cur_id 和修改 binding 的部分,再也不知道如何在中实现它们。 mly 。或者我应该在下一阶段实现机制 - interpreter.ml ?但在这种情况下,问题是如何使 .mly 与类型 s_var 一致,我应该在绑定



的一部分提供另一个问题$ c> s_var_uniqueId 是关于中的语句 c>的这部分 .mly

  id = IDENT COLONEQ e = expression 
{Sc_assign(Sle_var {s_var_name = id; s_var_type = St_void},e)}
/ pre>

这里,我还需要提供一个变量的下一级( interpreter.ml )我只知道 s_var_name ,所以我可以做它的 s_var_type s_var_uniqueId 这里?



有人可以帮忙吗?非常感谢!

解决方案

问问自己的第一个问题是,你实际上是否需要 ID。从我的经验,他们几乎从来没有必要,甚至有用。如果您要做的是通过字母等效来使变量唯一,那么这个应该在解析完​​成后发生,并且可能会涉及某种形式的 DeBruijn索引,而不是唯一标识符。



无论如何,每次调用时返回一个新的整数标识符的函数是:

  let unique = 
let last = ref 0 in
fun() - > incr last; !last

let one = unique()(* 1 *)
let two = unique()(* 2 *)

所以,你可以简单地赋值 {...; s_var_uniqueId = unique()}



您尝试解决的更重要的问题是变量绑定。变量 x 在一个位置定义并在另一个位置使用,您需要确定它在两个位置都是相同的变量。有很多方法这样做,其中一个是延迟绑定,直到翻译。



首先,我要定义一个上下文:它是一个一组允许您根据名称轻松检索变量的变量。你可能想用哈希表或地图创建它,但为了保持简单,我将使用 List.assoc 这里。

  type s_context = {
s_ctx_parent:s_context option;
s_ctx_bindings:(string *(int * s_type))list;
s_ctx_size:int;
}

让empty_context parent = {
s_ctx_parent = parent;
s_ctx_bindings = [];
s_ctx_size = 0
}

让bind v_name v_type ctx =
尝试let _ = List.assoc ctx.s_ctx_bindings
中的v_name failwith变量是已定义
with Not_found - >
{ctx with
s_ctx_bindings =(v_name,(ctx.s_ctx_size,v_type))
:: ctx.s_ctx_bindings;
s_ctx_size = ctx.s_ctx_size + 1}

让rec找到v_name ctx =
try 0,List.assoc ctx.s_ctx_bindings v_name
with Not_found - >
match ctx.s_ctx_parent with
|一些父 - > let depth,found =在
depth + 1中找到v_name parent,找到
|无 - > failwith未定义变量

因此, bind 向当前上下文添加一个新变量, find 在当前上下文中查找变量及其父,并返回绑定数据和发现它的深度。所以,你可以在一个上下文中的所有全局变量,然后在另一个上下文中的全局上下文作为其父,所有局部变量在一个函数中的所有参数(当你有它们)在第三个上下文



因此,例如 find'x'ctx 将返回 0,(3,St_int)其中 0 是变量的DeBruijn索引, 3 是由DeBruijn索引标识的上下文中变量的位置, St_int 是类型。

  type s_var = {
s_var_deBruijn:int;
s_var_type:s_type;
s_var_pos:int
}

让找到v_name ctx =
let deBruijn,(pos,typ)=在
中查找v_name ctx {s_var_deBruijn = deBruijn ;
s_var_type = typ;
s_var_pos = pos}

当然,你需要你的函数来存储它们的上下文,请确保第一个参数是上下文中位置0的变量:

  type s_fun = 
{s_fun_name:串;
s_fun_type:s_type;
s_fun_params:context;
s_fun_body:s_block; }

let context_of_paramlist parent paramlist =
List.fold_left
(fun ctx(v_name,v_type) - > bind v_name v_type ctx)
(empty_context parent)
paramlist

然后,您可以更改解析器以考虑上下文。诀窍是,不是返回表示AST的一部分的对象,大多数规则将返回一个函数,它将上下文作为参数并返回一个AST节点。



例如:

  int_expression:
(*常量:忽略上下文*)
| c = INT {fun _ Se_const(Sc_int c)}
(*变量:查找contex *中的变量)
| id = IDENT {fun ctx - > Se_var(find id ctx)}
(*子表达式:将上下文传递给两个*)
| e1 = int_expression o = operator e2 = int_expression
{fun ctx - > Sebinary(o,e1 ctx,e2 ctx)}
;

因此,您只需通过表达式递归递归上下文。唯一的聪明的部分是当创建新的上下文时(你还没有这种语法,所以我只是添加一个占位符):

  | function_definition_expression(args,body)
{fun ctx - > let ctx = context_of_paramlist(Some ctx)args in
{s_fun_params = ctx;
s_fun_body = body ctx}}

除了全局上下文不会返回一个函数,但是 block 规则,因此从全局变量创建上下文并提供)。

  prog:
PROGRAM IDENT SEMICOLON
globals = variables
main = block
DOT
{let ctx = context_of_paramlist None全局变量
{globals = ctx;
main = main ctx}}

所有这些都使得解释器的实现变得更加容易由于DeBruijn指数:您可以有一个堆栈,其中保存您的值( value )定义为:

  type stack = value array list 

和写变量 x 的过程非常简单:

  = 
(List.nth stack x.s_var_deBruijn)。(x.s_var_pos)

让写堆栈x值=
(List.nth stack x.s_var_deBruijn)。 .s_var_pos)< - value

此外,由于我们确保函数参数的顺序相同作为它们在函数上下文中的位置,如果你想调用函数 f ,它的参数存储在数组 args ,那么构建堆栈的过程很简单:

  let inner_stack = args :: stack in 
.s_fun_body with inner_stack here *)

但我相信你会有更多的问题问你什么时候开始使用你的intereter;)


I am writing a compiler of mini-pascal in Ocaml. I would like my compiler to accept the following code for instance:

program test;
var
   a,b : boolean;
   n : integer;
begin
   ...
end.

I have difficulties in dealing with the declaration of variables (the part following var). At the moment, the type of variables is defined like this in sib_syntax.ml:

type s_var =
    { s_var_name: string;
      s_var_type: s_type; 
      s_var_uniqueId: s_uniqueId (* key *) }

Where s_var_uniqueId (instead of s_var_name) is the unique key of the variables. My first question is, where and how I could implement the mechanism of generating a new id (actually by increasing the biggest id by 1) every time I have got a new variable. I am wondering if I should implement it in sib_parser.mly, which probably involves a static variable cur_id and the modification of the part of binding, again don't know how to realize them in .mly. Or should I implement the mechanism at the next stage - the interpreter.ml? but in this case, the question is how to make the .mly consistent with the type s_var, what s_var_uniqueId should I provide in the part of binding?

Another question is about this part of statement in .mly:

id = IDENT COLONEQ e = expression
  { Sc_assign (Sle_var {s_var_name = id; s_var_type = St_void}, e) }

Here, I also need to provide the next level (the interpreter.ml) a variable of which I only know the s_var_name, so what could I do regarding its s_var_type and s_var_uniqueId here?

Could anyone help? Thank you very much!

解决方案

The first question to ask yourself is whether you actually need an unique id. From my experience, they're almost never necessary or even useful. If what you're trying to do is making variables unique through alpha-equivalence, then this should happen after parsing is complete, and will probably involve some form of DeBruijn indices instead of unique identifiers.

Either way, a function which returns a new integer identifier every time it is called is:

let unique = 
  let last = ref 0 in 
  fun () -> incr last ; !last

let one = unique ()  (* 1 *)
let two = unique ()  (* 2 *)

So, you can simply assign { ... ; s_var_uniqueId = unique () } in your Menhir rules.

The more important problem you're trying to solve here is that of variable binding. Variable x is defined in one location and used in another, and you need to determine that it happens to be the same variable in both places. There are many ways of doing this, one of them being to delay the binding until the interpreter. I'm going to show you how to deal with this during parsing.

First, I'm going to define a context: it's a set of variables that allows you to easily retrieve a variable based on its name. You might want to create it with hash tables or maps, but to keep things simple I will be using List.assoc here.

type s_context = {
  s_ctx_parent : s_context option ;
  s_ctx_bindings : (string * (int * s_type)) list ;
  s_ctx_size : int ;
}

let empty_context parent = {
  s_ctx_parent = parent ;
  s_ctx_bindings = [] ;
  s_ctx_size = 0
}

let bind v_name v_type ctx = 
  try let _ = List.assoc ctx.s_ctx_bindings v_name in
      failwith "Variable is already defined"
  with Not_found -> 
    { ctx with 
      s_ctx_bindings = (v_name, (ctx.s_ctx_size, v_type)) 
        :: ctx.s_ctx_bindings ;
      s_ctx_size = ctx.s_ctx_size + 1 }

let rec find v_name ctx =       
  try 0, List.assoc ctx.s_ctx_bindings v_name
  with Not_found -> 
    match ctx.s_ctx_parent with 
      | Some parent -> let depth, found = find v_name parent in
                       depth + 1, found
      | None -> failwith "Variable is not defined"

So, bind adds a new variable to the current context, find looks for a variable in the current context and its parents, and returns both the bound data and the depth at which it was found. So, you could have all global variables in one context, then all parameters of a function in another context that has the global context as its parent, then all local variables in a function (when you'll have them) in a third context that has the function's main context as the parent, and so on.

So, for instance, find 'x' ctx will return something like 0, (3, St_int) where 0 is the DeBruijn index of the variable, 3 is the position of the variable in the context identified by the DeBruijn index, and St_int is the type.

type s_var = {
  s_var_deBruijn: int;
  s_var_type: s_type;
  s_var_pos: int 
}

let find v_name ctx = 
   let deBruijn, (pos, typ) = find v_name ctx in 
   { s_var_deBruijn = deBruijn ;
     s_var_type = typ ;
     s_var_pos = pos }

Of course, you need your functions to store their context, and make sure that the first argument is the variable at position 0 within the context:

type s_fun =
{ s_fun_name: string;
  s_fun_type: s_type;
  s_fun_params: context; 
  s_fun_body: s_block; }

let context_of_paramlist parent paramlist = 
  List.fold_left 
    (fun ctx (v_name,v_type) -> bind v_name v_type ctx) 
    (empty_context parent)
    paramlist

Then, you can change your parser to take into account the context. The trick is that instead of returning an object representing part of your AST, most of your rules will return a function that takes a context as an argument and returns an AST node.

For instance:

int_expression:
  (* Constant : ignore the context *)
| c = INT { fun _ -> Se_const (Sc_int c) }
  (* Variable : look for the variable inside the contex *)
| id = IDENT { fun ctx -> Se_var (find id ctx) }
  (* Subexpressions : pass the context to both *)
| e1 = int_expression o = operator e2 = int_expression 
  { fun ctx -> Se_binary (o, e1 ctx, e2 ctx) }
;

So, you simply propagate the context "down" recursively through the expressions. The only clever parts are those when new contexts are created (you don't have this syntax yet, so I'm just adding a placeholder):

| function_definition_expression (args, body) 
  { fun ctx -> let ctx = context_of_paramlist (Some ctx) args in
               { s_fun_params = ctx ; 
                 s_fun_body = body ctx } }

As well as the global context (the program rule itself does not return a function, but the block rule does, and so a context is created from the globals and provided).

prog:
  PROGRAM IDENT SEMICOLON
  globals = variables
  main = block
  DOT
    { let ctx = context_of_paramlist None globals in 
      { globals = ctx;
        main = main ctx } }

All of this makes the implementation of your interpreter much easier due to the DeBruijn indices: you can have a "stack" which holds your values (of type value) defined as:

type stack = value array list 

Then, reading and writing variable x is as simple as:

let read stack x = 
  (List.nth stack x.s_var_deBruijn).(x.s_var_pos)

let write stack x value = 
  (List.nth stack x.s_var_deBruijn).(x.s_var_pos) <- value

Also, since we made sure that function parameters are in the same order as their position in the function context, if you want to call function f and its arguments are stored in the array args, then constructing the stack is as simple as:

let inner_stack = args :: stack in
(* Evaluate f.s_fun_body with inner_stack here *)

But I'm sure you'll have a lot more questions to ask when you start working on your interpeter ;)

这篇关于在哪里/如何声明在Ocaml编写的编译器中的变量的唯一键?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆