Haskell 解析器到 AST 数据类型,赋值 [英] Haskell parser to AST data type, assignment

查看:33
本文介绍了Haskell 解析器到 AST 数据类型,赋值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经在互联网上搜索了几天,试图找到我的问题的答案,我终于承认失败了.
我得到了一个语法:

挖掘::= 0 |1 |2 |3 |4 |5 |6 |7 |8 |9
整数 ::= 挖 |挖掘国际
变量 ::= a |乙 |... z |一个 |乙 |C |... |Z
表达式 ::= 整数 |- 表达式 |+ Expr Expr |* Expr Expr |无功|让 Var = Expr in Expr

我被告知使用这种语法解析、评估和打印表达式
其中运算符 * + - 具有其正常含义
具体任务是写一个函数parse::String ->AST

以字符串作为输入并在输入格式正确时返回抽象语法树(我可以假设它是).

我被告知我可能需要一个合适的数据类型,并且该数据类型可能需要从其他一些类派生.

遵循示例输出
data AST = Leaf Int |总和 AST |最小 AST |...

此外,我应该考虑编写一个函数
tokens::String ->[字符串]
将输入字符串拆分为令牌列表
解析应该用来完成ast::[字符串] ->(AST,[String])
其中输入是一个标记列表,它输出一个 AST,为了解析子表达式,我应该简单地递归使用 ast 函数.

我还应该制作一个printExpr方法来打印结果,以便
printE: AST ->字符串
printE(parse "* 5 5") 产生 "5*5""(5*5)"
以及评估表达式的函数
evali :: AST ->内部

我只想指出我可能开始的正确方向.我一般对 Haskell 和 FP 知之甚少,为了解决这个任务,我用 Java 制作了一些字符串处理函数,这让我意识到我偏离了轨道.
所以在正确的方向上有一个小指针,也许是对 AST 应该是什么样子的如何"的解释
连续第三天仍然没有运行代码,我真的很感谢任何帮助我找到解决方案的尝试!提前致谢!
编辑

我可能不清楚:我想知道我应该如何从读取和标记输入字符串到制作 AST.

解决方案

将标记解析为抽象语法树

好的,让我们来看看你的语法

Dig ::= 0 |1 |2 |3 |4 |5 |6 |7 |8 |9整数 ::= 挖 |挖掘智力变量 ::= a |乙 |... z |一个 |乙 |C |... |Z表达式 ::= 整数 |- 表达式 |+ Expr Expr |* Expr Expr |无功|让 Var = Expr in Expr

这是一个很好的简单语法,因为您可以从第一个标记中看出它将是哪种类型的表达.(如果有更复杂的东西,比如 + 介于数字之间,或者 - 被用作减法以及否定,你需要成功列表技巧,解释在函数解析器.)

让我们有一些示例原始输入:

rawinput = "- 6 + 45 let x = - 5 in * x x"

我从语法上理解的代表"(- 6 (+ 45 (let x=-5 in (* x x))))",我假设你把它标记为

tokenised_input' = ["-","6","+","4","5","let","x","=","-","5","in","*","x","x"]

符合语法,但你可能已经得到

tokenised_input = ["-","6","+","45","let","x","=","-","5","in","*","x","x"]

哪个更适合您的示例 AST.我认为用你的语法来命名你的 AST 是个好习惯,所以我要继续替换

data AST = Leaf Int |总和 AST |最小 AST |...

data Expr = E_Int Int |E_Neg Expr |E_Sum Expr Expr |E_Prod Expr Expr |E_Var 字符|E_Let {letvar::Char,letequal:: Expr,letin::Expr}导出显示

我已经命名了 E_Let 的位,以使其更清楚地代表什么.

编写解析函数

你可以通过添加import Data.Char (isDigit)来使用isDigit来帮助:

expr :: [String] ->(表达式,[字符串])expr [] = 错误意外的输入结束"expr (s:ss) |all isDigit s = (E_Int (read s),ss)|s == "-" = let (e,ss') = expr ss in (E_Neg e,ss')|s == "+" = (E_Sum e e',ss'') 其中(e,ss') = expr ss(e',ss'') = expr ss'-- 更多案例

哎呀!太多的 let 子句掩盖了含义,我们将为 E_Prod 编写相同的代码,而为 E_Let 编写更糟糕的代码.让我们解决这个问题!

处理这个的标准方法是编写一些组合器;与其通过我们的定义无聊地将输入 [String] 线程化,而是定义方法弄乱解析器(映射)的输出并组合多个解析器合二为一(提升).

清理代码1:map

首先我们应该定义 pmap,我们自己的 map 等价函数,这样我们就可以做 pmap E_Neg (expr1 ss)而不是 let (e,ss') = expr1 ss in (E_Neg e,ss')

pmap :: (a -> b) ->([字符串] -> (a,[字符串])) ->([字符串] -> (b,[字符串]))

nonono,我什至看不懂!我们需要一个类型同义词:

type Parser a = [String] ->(a,[字符串])pmap :: (a -> b) ->解析器 a ->解析器bpmap f p = ss ->让 (a,ss') = p ss在 (f a,ss')

但如果我真的这样做会更好

data Parser a = Par [String] ->(a,[字符串])

这样我就可以了

instance Functor Parser wherefmap f (Par p) = Par (pmap f p)

如果你喜欢,我会把它留给你去弄清楚.

清理代码2:合并两个解析器

我们还需要处理有两个解析器要运行的情况,并且我们想使用一个函数来组合他们的结果.这称为将函数提升到解析器.

liftP2 :: (a -> b -> c) ->解析器 a ->解析器 b ->解析器cLiftP2 f p1 p2 = ss0 ->让(a,ss1) = p1 ss0(b,ss2) = p2 ss1在 (f a b,ss2)

甚至三个解析器:

liftP3 :: (a -> b -> c -> d) ->解析器 a ->解析器 b ->解析器 c ->解析器

我会让你想想怎么做.在 let 语句中,您需要 liftP5 来解析 let 语句的各个部分,提升一个忽略 "=""in" 的函数.你可以做

equals_::Parser()equals_ [] = 错误equals_:预期 = 但输入结束"等于_ ("=":ss) = ((),ss)equals_ (s:ss) = error $ "equals_: expected = but got "++s

还有一些可以帮助解决这个问题.

实际上,pmap 也可以称为 liftP1,但 map 是这种东西的传统名称.

用漂亮的组合器重写

现在我们准备好清理expr:

expr :: [String] ->(表达式,[字符串])expr [] = 错误意外的输入结束"expr (s:ss) |all isDigit s = (E_Int (read s),ss)|s == "-" = pmap E_Neg expr ss|s == "+" = liftP2 E_Sum expr expr ss-- 更多案例

那一切正常.真的,没关系.但是 liftP5 会有点长,而且感觉很乱.

进一步清理 - 非常好的 Applicative 方式

应用函子是要走的路.记得我建议重构为

data Parser a = Par [String] ->(a,[字符串])

所以你可以让它成为Functor的实例?也许你不想,因为你所获得的只是一个完美运行的 pmap 的新名称 fmap 和您必须处理所有那些使您的代码变得混乱的 Par 构造函数.不过,也许这会让你重新考虑;我们可以import Control.Applicative,然后使用 data 声明,我们可以定义 <*>,这意味着 then 并使用 <$> 而不是 pmap>,带有*>的意思<*>-but-forget-the-the-result-of-the-left-hand-side 所以你会写

expr (s:ss) |s == "let" = E_Let <$>变量 *>等于_<*>expr <*>在_ *>表达式

这看起来很像您的语法定义,因此很容易编写第一次运行的代码.这就是我喜欢编写解析器的方式.事实上,这就是我喜欢写很多东西的方式.你只需要定义 fmap<*>pure,都是简单的,不需要重复的 liftP3liftP4

阅读有关应用函子的信息.他们很棒.

使解析器适用的提示:pure 不会更改列表.<*>liftP2 类似,但函数不是来自外部,它来自 p1 的输出.>

I have been searching around the interwebs for a couple of days, trying to get an answer to my questions and i'm finally admitting defeat.
I have been given a grammar:

Dig ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Int ::= Dig | Dig Int
Var ::= a | b | ... z | A | B | C | ... | Z
Expr ::= Int | - Expr | + Expr Expr | * Expr Expr | Var | let Var = Expr in Expr

And i have been told to parse, evaluate and print expressions using this grammar
where the operators * + - has their normal meaning
The specific task is to write a function parse :: String -> AST

that takes a string as input and returns an abstract syntax tree when the input is in the correct format (which i can asume it is).

I am told that i might need a suitable data type and that data type might need to derive from some other classes.

Following an example output
data AST = Leaf Int | Sum AST AST | Min AST | ...

Further more, i should consider writing a function
tokens::String -> [String]
to split the input string into a list of tokens
Parsing should be accomplished with
ast::[String] -> (AST,[String])
where the input is a list of tokens and it outputs an AST, and to parse sub-expressions i should simply use the ast function recursively.

I should also make a printExpr method to print the result so that
printE: AST -> String
printE(parse "* 5 5") yields either "5*5" or "(5*5)"
and also a function to evaluate the expression
evali :: AST -> Int

I would just like to be pointed in the right direction of where i might start. I have little knowledge of Haskell and FP in general and trying to solve this task i made some string handling function out of Java which made me realize that i'm way off track.
So a little pointer in the right direction, and maybe an explantion to 'how' the AST should look like
Third day in a row and still no running code, i really appreciate any attempt to help me find a solution! Thanks in advance!
Edit

I might have been unclear: I'm wondering how i should go about from having read and tokenized an input string to making an AST.

解决方案

Parsing tokens into an Abstract Syntax Tree

OK, let's take your grammar

Dig ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Int ::= Dig | Dig Int
Var ::= a | b | ... z | A | B | C | ... | Z
Expr ::= Int | - Expr | + Expr Expr | * Expr Expr | Var | let Var = Expr in Expr

This is a nice easy grammar, because you can tell from the first token what sort of epression it will be. (If there was something more complicated, like + coming between numbers, or - being used for subtraction as well as negation, you'd need the list-of-successes trick, explained in Functional Parsers.)

Let's have some sample raw input:

rawinput = "- 6 + 45 let x = - 5 in * x x"

Which I understand from the grammar represents "(- 6 (+ 45 (let x=-5 in (* x x))))", and I'll assume you tokenised it as

tokenised_input' = ["-","6","+","4","5","let","x","=","-","5","in","*","x","x"]

which fits the grammar, but you might well have got

tokenised_input = ["-","6","+","45","let","x","=","-","5","in","*","x","x"]

which fits your sample AST better. I think it's good practice to name your AST after bits of your grammar, so I'm going to go ahead and replace

data AST = Leaf Int | Sum AST AST | Min AST | ...

with

data Expr = E_Int Int | E_Neg Expr | E_Sum Expr Expr | E_Prod Expr Expr | E_Var Char 
                      | E_Let {letvar::Char,letequal:: Expr,letin::Expr}
 deriving Show

I've named the bits of an E_Let to make it clearer what they represent.

Writing a parsing function

You could use isDigit by adding import Data.Char (isDigit) to help out:

expr :: [String] -> (Expr,[String])
expr [] = error "unexpected end of input"
expr (s:ss) | all isDigit s = (E_Int (read s),ss)
             | s == "-" = let (e,ss') = expr ss in (E_Neg e,ss') 
             | s == "+" = (E_Sum e e',ss'') where
                          (e,ss') = expr ss
                          (e',ss'') = expr ss'
            -- more cases

Yikes! Too many let clauses obscuring the meaning, and we'll be writing the same code for E_Prod and very much worse for E_Let. Let's get this sorted out!

The standard way of dealing with this is to write some combinators; instead of tiresomely threading the input [String]s through our definition, define ways to mess with the output of parsers (map) and combine multiple parsers into one (lift).

Clean up the code 1: map

First we should define pmap, our own equivalent of the map function so we can do pmap E_Neg (expr1 ss) instead of let (e,ss') = expr1 ss in (E_Neg e,ss')

pmap :: (a -> b) -> ([String] -> (a,[String])) -> ([String] -> (b,[String]))

nonono, I can't even read that! We need a type synonym:

type Parser a = [String] -> (a,[String])

pmap :: (a -> b) -> Parser a -> Parser b
pmap f p = ss -> let (a,ss') = p ss 
                  in (f a,ss') 

But really this would be better if I did

data Parser a = Par [String] -> (a,[String])

so I could do

instance Functor Parser where
  fmap f (Par p) = Par (pmap f p)

I'll leave that for you to figure out if you fancy.

Clean up the code 2: combining two parsers

We also need to deal with the situation when we have two parsers to run, and we want to combine their results using a function. This is called lifting the function to parsers.

liftP2 :: (a -> b -> c) -> Parser a -> Parser b -> Parser c
liftP2 f p1 p2 = ss0 -> let
              (a,ss1) = p1 ss0
              (b,ss2) = p2 ss1
              in (f a b,ss2)

or maybe even three parsers:

liftP3 :: (a -> b -> c -> d) -> Parser a -> Parser b -> Parser c -> Parser d

I'll let you think how to do that. In the let statement you'll need liftP5 to parse the sections of a let statement, lifting a function that ignores the "=" and "in". You could make

equals_ :: Parser ()
equals_ [] = error "equals_: expected = but got end of input"
equals_ ("=":ss) = ((),ss)
equals_ (s:ss) = error $ "equals_: expected = but got "++s

and a couple more to help out with this.

Actually, pmap could also be called liftP1, but map is the traditional name for that sort of thing.

Rewritten with the nice combinators

Now we're ready to clean up expr:

expr :: [String] -> (Expr,[String])
expr [] = error "unexpected end of input"
expr (s:ss) | all isDigit s = (E_Int (read s),ss)
            | s == "-" = pmap   E_Neg expr ss
            | s == "+" = liftP2 E_Sum expr expr ss
            -- more cases

That'd all work fine. Really, it's OK. But liftP5 is going to be a bit long, and feels messy.

Taking the cleanup further - the ultra-nice Applicative way

Applicative Functors is the way to go. Remember I suggested refactoring as

data Parser a = Par [String] -> (a,[String])

so you could make it an instance of Functor? Perhaps you don't want to, because all you've gained is a new name fmap for the perfectly working pmap and you have to deal with all those Par constructors cluttering up your code. Perhaps this will make you reconsider, though; we can import Control.Applicative, then using the data declaration, we can define <*>, which sort-of means then and use <$> instead of pmap, with *> meaning <*>-but-forget-the-result-of-the-left-hand-side so you would write

expr (s:ss) | s == "let" = E_Let <$> var *> equals_ <*> expr <*> in_ *> expr

Which looks a lot like your grammar definition, so it's easy to write code that works first time. This is how I like to write Parsers. In fact, it's how I like to write an awful lot of things. You'd only have to define fmap, <*> and pure, all simple ones, and no long repetative liftP3, liftP4 etc.

Read up about Applicative Functors. They're great.

Hints for making Parser applicative: pure doesn't change the list. <*> is like liftP2, but the function doesn't come from outside, it comes as the output from p1.

这篇关于Haskell 解析器到 AST 数据类型,赋值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆