在Haskell中实现语言解释器 [英] Implementing a language interpreter in Haskell
问题描述
我想在Haskell中实现一个命令式的语言解释器(用于教育目的)。但是我很难为我的解释器创建正确的架构:我应该如何存储变量?我如何实现嵌套函数调用?我应该如何实现变量范围?如何在我的语言中添加调试可能性?我应该使用monads / monad变压器/其他技术吗?等等。
有没有人知道关于这个主题的优秀文章/论文/教程/资料来源?
以下内容可能会用作minitutorial。
我假设您已经解决了解析所需程序的源文本的问题写一个解释器,并且你有一些类型来捕获你的语言的抽象语法。我在这里使用的语言非常简单,只有整数表达式和一些基本语句。
Preliminaries
让我们先导入一些我们稍后会用到的模块。
import Data.Function
import Data .List
命令式语言的本质是它具有某种形式的可变变量。这里,变量只是由字符串表示:
type Var = String
表达式
接下来,我们定义表达式。表达式由整型常量,变量引用和算术运算构成。
infixl 6:+:,: - :
infixl 7:* :,:/:
数据Exp
= C Int - 常数
| V Var - 变量
| Exp:+:Exp - addition
| Exp: - :Exp - 减去
| Exp:*:Exp - 乘法
| Exp:/:Exp - 除法
例如,将常量2加到变量上的表达式 x
由 Vx表示:+:C 2
。
语句
语句语言相当简单。我们有三种形式的陈述:变量赋值,循环和序列。
infix 1:=
数据stmt
= Var:= Exp - 赋值
| Exp Stmt - 循环
| Seq [stmt] - sequence
例如,用于交换变量 x
和 y
可以用 Seq [tmp:= V x,x:= Vy,y:= Vtmp]
。
程序
程序只是一个声明。
type Prog = Stmt
商店
现在,让我们转到实际翻译。运行程序时,我们需要跟踪分配给程序中不同变量的值。这些值只是整数,作为我们记忆的表示,我们只是使用由变量和值组成的对的列表。
类型Val = Int
类型Store = [(Var,Val)]
评估表达式通过将常量映射到它们的值,查找存储器中变量的值以及将算术运算映射到它们的Haskell对应物来评估表达式。$ / $>
eval :: Exp - >商店 - > Val
pre>
eval(C n)r = n
eval(V x)r =
的情况查找x r Nothing - >错误(未绑定的变量`++ x ++')
Just v - > v
eval(e1:+:e2)r = eval e1 r + eval e2 r
eval(e1: - :e2)r = eval e1 r - eval e2 r
eval(e1 :*:e2)r = eval e1 r * eval e2 r
eval(e1:/:e2)r = eval e1 r`div` eval e2 r
请注意,如果商店包含多个变量绑定,
lookup
会选择第一个绑定存储。
执行语句
虽然表达式的评估不能改变存储的内容,但执行声明实际上可能会导致商店更新。因此,执行语句的函数将商店作为参数,并生成可能更新的商店。
exec :: Stmt - >商店 - >存储
exec(x:= e)r =(x,eval e r):r
exec(while e s)r | eval e r / = 0 = exec(Seq [s,While e s])r
|否则= r
exec(Seq [])r = r
exec(Seq(s:ss))r = exec(Seq ss)(exec sr)
顶级解释器
运行一个程序可以简化为执行顶级语句一个初始商店。
run :: Prog - >商店 - >存储
运行pr = nubBy((==)``fst)(exec pr)
<执行语句后,我们清理任何阴影绑定,以便我们可以轻松读取最终存储的内容。
示例
作为例子,考虑下面的程序计算存储在变量
n
中的数字的斐波那契数,并将其结果存储在变量x
。fib :: Prog
fib = Seq $ b $b=C 0
,y:= C 1
,while(Vn)$ Seq
[z:= Vx :+:Vy
,x:= Vy
,y:= Vz
,n:= Vn : - :C 1
]
]
例如,环境,我们现在可以使用我们的解释器来计算第25个斐波纳契数:
> lookupx$ run fib [(n,25)]
只需75025
Monadic Interpretation
当然,在这里,我们正在处理一个非常简单和微小的命令式语言。随着您的语言变得越来越复杂,您的口译员的实施也将越来越复杂。例如,考虑添加过程时需要添加哪些内容,并需要区分本地(基于堆栈)的存储和全局(基于堆)的存储。回到你的问题的这一部分,你可能确实会考虑引入monad来简化你的解释器的实现。
在上面的示例解释器中,有两种效应可以被一元结构捕获:
- 传递和更新商店。
- 遇到运行时错误时中止运行程序。 (在上面的实现中,当发生这样的错误时,解释器会崩溃。)
第一个效果通常由状态monad ,第二个由错误monad。让我们简单地研究如何为我们的解释器做到这一点。
我们通过从标准库中导入更多模块来准备。
import Control.Monad
我们可以使用monad变换器通过组合基本状态monad和基本错误monad来为我们的两种效果构建复合monad。在这里,我们只是简单地构建复合monad。
newtype Interp a = Interp {runInterp :: Store - > ; String(a,Store)}
实例Monad Interp其中
return x = Interp $ \r - > Right(x,r)
i>> = k = Interp $ \r - >
的情况runInterp i r Left msg - >留下msg
Right(x,r') - > runInterp(k x)r'
fail msg = Interp $ \_ - >左msg
为了读写商店,我们引入了有效的函数
rd
和wr
:
rd: :Var - > Interp Val
rd x = Interp $ \r - >案例查询x r of
Nothing - >左(unbound变量`++ x ++')
只是v - > Right(v,r)
wr :: Var - > Val - > Interp()
wr x v = Interp $ \r - >请注意,右(()如果变量查找失败,rd
产生一个Left
包装的错误消息。
单表达式求值器的单值版本现在读取
eval :: Exp - > Interp Val
eval(C n)= do返回n
eval(V x)= do rd x
eval(e1:+:e2)= do v1 < - eval e1
v2 < - eval e2
return(v1 + v2)
eval(e1: - :e2)= do v1 < - eval e1
v2 < - eval e2
return(v1-v2)
eval(e1:*:e2)= do v1 < - eval e1
v2 < - eval e2
return(v1 * v2)
eval(e1:/:e2)= do v1 < - eval e1
v2 < - eval e2
if v2 == 0
然后失败零除
else return(v1`div` v2)
在
:/:
,除以零会导致通过Monad
- 方法失败$ c产生错误消息$ c>,这对于
Interp
来说简化为将消息包装在Left
- 值中。
为执行语句我们有
exec :: Stmt - > Interp()
exec(x:= e)= do v < - eval e
wr xv
exec(While es)= do v < - eval e
when (vq = 0)(exec(Seq [s,While es]))
exec(Seq [])= do return()
exec(Seq(s:ss))= do exec s
exec(Seq ss)
exec 表示这些语句不会导致值,但是只会对它们对商店的影响或它们可能触发的运行时错误执行。
最后,在函数
run
中,我们执行一次性计算并处理其效果。
run :: Prog - >商店 - >任一字符串存储
运行p r = case
的runInterp(exec p)r Left msg - >左msg
右(_,r') - >在交互式环境中,右键点击(nubBy((== ==``fst)r')
我们现在可以重新审视我们的示例程序的解释:
>查找x`fmap` run fib [(n,25)]
正确(仅75025)
> lookupx`fmap` run fib []
Leftunbound variable`n'
I want to implement an imperative language interpreter in Haskell (for educational purposes). But it's difficult for me to create right architecture for my interpreter: How should I store variables? How can I implement nested function calls? How should I implement variable scoping? How can I add debugging possibilities in my language? Should I use monads/monad transformers/other techniques? etc.
Does anybody know good articles/papers/tutorials/sources on this subject?
解决方案If you are new to writing this kind of processors, I would recommend to put off using monads for a while and first focus on getting a barebones implementation without any bells or whistles.
The following may serve as a minitutorial.
I assume that you have already tackled the issue of parsing the source text of the programs you want to write an interpreter for and that you have some types for capturing the abstract syntax of your language. The language that I use here is very simple and only consists of integer expressions and some basic statements.
Preliminaries
Let us first import some modules that we will use in just a bit.
import Data.Function import Data.List
The essence of an imperative language is that it has some form of mutable variables. Here, variables simply represented by strings:
type Var = String
Expressions
Next, we define expressions. Expressions are constructed from integer constants, variable references, and arithmetic operations.
infixl 6 :+:, :-: infixl 7 :*:, :/: data Exp = C Int -- constant | V Var -- variable | Exp :+: Exp -- addition | Exp :-: Exp -- subtraction | Exp :*: Exp -- multiplication | Exp :/: Exp -- division
For example, the expression that adds the constant 2 to the variable
x
is represented byV "x" :+: C 2
.Statements
The statement language is rather minimal. We have three forms of statements: variable assignments, while loops, and sequences.
infix 1 := data Stmt = Var := Exp -- assignment | While Exp Stmt -- loop | Seq [Stmt] -- sequence
For example, a sequence of statements for "swapping" the values of the variables
x
andy
can be represented bySeq ["tmp" := V "x", "x" := V "y", "y" := V "tmp"]
.Programs
A program is just a statement.
type Prog = Stmt
Stores
Now, let us move to the actual interpreter. While running a program, we need to keep track of the values assigned to the different variables in the programs. These values are just integers and as a representation of our "memory" we just use lists of pairs consisting of a variable and a value.
type Val = Int type Store = [(Var, Val)]
Evaluating expressions
Expressions are evaluated by mapping constants to their value, looking up the values of variables in the store, and mapping arithmetic operations to their Haskell counterparts.
eval :: Exp -> Store -> Val eval (C n) r = n eval (V x) r = case lookup x r of Nothing -> error ("unbound variable `" ++ x ++ "'") Just v -> v eval (e1 :+: e2) r = eval e1 r + eval e2 r eval (e1 :-: e2) r = eval e1 r - eval e2 r eval (e1 :*: e2) r = eval e1 r * eval e2 r eval (e1 :/: e2) r = eval e1 r `div` eval e2 r
Note that if the store contains multiple bindings for a variable,
lookup
selects the bindings that comes first in the store.Executing statements
While the evaluation of an expression cannot alter the contents of the store, executing a statement may in fact result in an update of the store. Hence, the function for executing a statement takes a store as an argument and produces a possibly updated store.
exec :: Stmt -> Store -> Store exec (x := e) r = (x, eval e r) : r exec (While e s) r | eval e r /= 0 = exec (Seq [s, While e s]) r | otherwise = r exec (Seq []) r = r exec (Seq (s : ss)) r = exec (Seq ss) (exec s r)
Note that, in the case of assignments, we simply push a new binding for the updated variable to the store, effectively shadowing any previous bindings for that variable.
Top-level Interpreter
Running a program reduces to executing its top-level statement in the context of an initial store.
run :: Prog -> Store -> Store run p r = nubBy ((==) `on` fst) (exec p r)
After executing the statement we clean up any shadowed bindings, so that we can easily read off the contents of the final store.
Example
As an example, consider the following program that computes the Fibonacci number of the number stored in the variable
n
and stores its result in the variablex
.fib :: Prog fib = Seq [ "x" := C 0 , "y" := C 1 , While (V "n") $ Seq [ "z" := V "x" :+: V "y" , "x" := V "y" , "y" := V "z" , "n" := V "n" :-: C 1 ] ]
For instance, in an interactive environment, we can now use our interpreter to compute the 25th Fibonacci number:
> lookup "x" $ run fib [("n", 25)] Just 75025
Monadic Interpretation
Of course, here, we are dealing with a very simple and tiny imperative language. As your language gets more complex, so will the implementation of your interpreter. Think for example about what additions you need when you add procedures and need to distinguish between local (stack-based) storage and global (heap-based) storage. Returning to that part of your question, you may then indeed consider the introduction of monads to streamline the implementation of your interpreter a bit.
In the example interpreter above, there are two "effects" that are candidates for being captured by a monadic structure:
- The passing around and updating of the store.
- Aborting running the program when a run-time error is encountered. (In the implementation above, the interpreter simply crashes when such an error occurs.)
The first effect is typically captured by a state monad, the second by an error monad. Let us briefly investigate how to do this for our interpreter.
We prepare by importing just one more module from the standard libraries.
import Control.Monad
We can use monad transformers to construct a composite monad for our two effects by combining a basic state monad and a basic error monad. Here, however, we simply construct the composite monad in one go.
newtype Interp a = Interp { runInterp :: Store -> Either String (a, Store) } instance Monad Interp where return x = Interp $ \r -> Right (x, r) i >>= k = Interp $ \r -> case runInterp i r of Left msg -> Left msg Right (x, r') -> runInterp (k x) r' fail msg = Interp $ \_ -> Left msg
For reading from and writing to the store, we introduce effectful functions
rd
andwr
:rd :: Var -> Interp Val rd x = Interp $ \r -> case lookup x r of Nothing -> Left ("unbound variable `" ++ x ++ "'") Just v -> Right (v, r) wr :: Var -> Val -> Interp () wr x v = Interp $ \r -> Right ((), (x, v) : r)
Note that
rd
produces aLeft
-wrapped error message if a variable lookup fails.The monadic version of the expression evaluator now reads
eval :: Exp -> Interp Val eval (C n) = do return n eval (V x) = do rd x eval (e1 :+: e2) = do v1 <- eval e1 v2 <- eval e2 return (v1 + v2) eval (e1 :-: e2) = do v1 <- eval e1 v2 <- eval e2 return (v1 - v2) eval (e1 :*: e2) = do v1 <- eval e1 v2 <- eval e2 return (v1 * v2) eval (e1 :/: e2) = do v1 <- eval e1 v2 <- eval e2 if v2 == 0 then fail "division by zero" else return (v1 `div` v2)
In the case for
:/:
, division by zero results in an error message being produced through theMonad
-methodfail
, which, forInterp
, reduces to wrapping the message in aLeft
-value.For the execution of statements we have
exec :: Stmt -> Interp () exec (x := e) = do v <- eval e wr x v exec (While e s) = do v <- eval e when (v /= 0) (exec (Seq [s, While e s])) exec (Seq []) = do return () exec (Seq (s : ss)) = do exec s exec (Seq ss)
The type of
exec
conveys that statements do not result in values but are executed only for their effects on the store or the run-time errors they may trigger.Finally, in the function
run
we perform a monadic computation and process its effects.run :: Prog -> Store -> Either String Store run p r = case runInterp (exec p) r of Left msg -> Left msg Right (_, r') -> Right (nubBy ((==) `on` fst) r')
In the interactive environment, we can now revisit the interpretation of our example program:
> lookup "x" `fmap` run fib [("n", 25)] Right (Just 75025) > lookup "x" `fmap` run fib [] Left "unbound variable `n'"
这篇关于在Haskell中实现语言解释器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!