Haskell 解析器到 AST 数据类型,赋值 [英] Haskell parser to AST data type, assignment
问题描述
我已经在互联网上搜索了几天,试图找到我的问题的答案,我终于承认失败了.
我得到了一个语法:
挖掘::= 0 |1 |2 |3 |4 |5 |6 |7 |8 |9
整数 ::= 挖 |挖掘国际
变量 ::= a |乙 |... z |一个 |乙 |C |... |Z
表达式 ::= 整数 |- 表达式 |+ Expr Expr |* Expr Expr |无功|让 Var = Expr in Expr
我被告知使用这种语法解析、评估和打印表达式
其中运算符 * + -
具有其正常含义
具体任务是写一个函数parse::String ->AST
以字符串作为输入并在输入格式正确时返回抽象语法树(我可以假设它是).
我被告知我可能需要一个合适的数据类型,并且该数据类型可能需要从其他一些类派生.
遵循示例输出data AST = Leaf Int |总和 AST |最小 AST |...
此外,我应该考虑编写一个函数tokens::String ->[字符串]
将输入字符串拆分为令牌列表
解析应该用来完成ast::[字符串] ->(AST,[String])
其中输入是一个标记列表,它输出一个 AST,为了解析子表达式,我应该简单地递归使用 ast 函数.
我还应该制作一个printExpr方法来打印结果,以便printE: AST ->字符串
printE(parse "* 5 5")
产生 "5*5"
或 "(5*5)"
以及评估表达式的函数evali :: AST ->内部
我只想指出我可能开始的正确方向.我一般对 Haskell 和 FP 知之甚少,为了解决这个任务,我用 Java 制作了一些字符串处理函数,这让我意识到我偏离了轨道.
所以在正确的方向上有一个小指针,也许是对 AST 应该是什么样子的如何"的解释
连续第三天仍然没有运行代码,我真的很感谢任何帮助我找到解决方案的尝试!提前致谢!
编辑
我可能不清楚:我想知道我应该如何从读取和标记输入字符串到制作 AST.
将标记解析为抽象语法树
好的,让我们来看看你的语法
Dig ::= 0 |1 |2 |3 |4 |5 |6 |7 |8 |9整数 ::= 挖 |挖掘智力变量 ::= a |乙 |... z |一个 |乙 |C |... |Z表达式 ::= 整数 |- 表达式 |+ Expr Expr |* Expr Expr |无功|让 Var = Expr in Expr
这是一个很好的简单语法,因为您可以从第一个标记中看出它将是哪种类型的表达.(如果有更复杂的东西,比如 +
介于数字之间,或者 -
被用作减法以及否定,你需要成功列表技巧,解释在函数解析器.)
让我们有一些示例原始输入:
rawinput = "- 6 + 45 let x = - 5 in * x x"
我从语法上理解的代表"(- 6 (+ 45 (let x=-5 in (* x x))))"
,我假设你把它标记为
tokenised_input' = ["-","6","+","4","5","let","x","=","-","5","in","*","x","x"]
符合语法,但你可能已经得到
tokenised_input = ["-","6","+","45","let","x","=","-","5","in","*","x","x"]
哪个更适合您的示例 AST
.我认为用你的语法来命名你的 AST 是个好习惯,所以我要继续替换
data AST = Leaf Int |总和 AST |最小 AST |...
与
data Expr = E_Int Int |E_Neg Expr |E_Sum Expr Expr |E_Prod Expr Expr |E_Var 字符|E_Let {letvar::Char,letequal:: Expr,letin::Expr}导出显示
我已经命名了 E_Let
的位,以使其更清楚地代表什么.
编写解析函数
你可以通过添加import Data.Char (isDigit)
来使用isDigit
来帮助:
expr :: [String] ->(表达式,[字符串])expr [] = 错误意外的输入结束"expr (s:ss) |all isDigit s = (E_Int (read s),ss)|s == "-" = let (e,ss') = expr ss in (E_Neg e,ss')|s == "+" = (E_Sum e e',ss'') 其中(e,ss') = expr ss(e',ss'') = expr ss'-- 更多案例
哎呀!太多的 let 子句掩盖了含义,我们将为 E_Prod
编写相同的代码,而为 E_Let
编写更糟糕的代码.让我们解决这个问题!
处理这个的标准方法是编写一些组合器;与其通过我们的定义无聊地将输入 [String]
线程化,而是定义方法弄乱解析器(映射)的输出并组合多个解析器合二为一(提升).
清理代码1:map
首先我们应该定义 pmap
,我们自己的 map
等价函数,这样我们就可以做 pmap E_Neg (expr1 ss)
而不是 let (e,ss') = expr1 ss in (E_Neg e,ss')
pmap :: (a -> b) ->([字符串] -> (a,[字符串])) ->([字符串] -> (b,[字符串]))
nonono,我什至看不懂!我们需要一个类型同义词:
type Parser a = [String] ->(a,[字符串])pmap :: (a -> b) ->解析器 a ->解析器bpmap f p = ss ->让 (a,ss') = p ss在 (f a,ss')
但如果我真的这样做会更好
data Parser a = Par [String] ->(a,[字符串])
这样我就可以了
instance Functor Parser wherefmap f (Par p) = Par (pmap f p)
如果你喜欢,我会把它留给你去弄清楚.
清理代码2:合并两个解析器
我们还需要处理有两个解析器要运行的情况,并且我们想使用一个函数来组合他们的结果.这称为将函数提升到解析器.
liftP2 :: (a -> b -> c) ->解析器 a ->解析器 b ->解析器cLiftP2 f p1 p2 = ss0 ->让(a,ss1) = p1 ss0(b,ss2) = p2 ss1在 (f a b,ss2)
甚至三个解析器:
liftP3 :: (a -> b -> c -> d) ->解析器 a ->解析器 b ->解析器 c ->解析器
我会让你想想怎么做.在 let 语句中,您需要 liftP5
来解析 let 语句的各个部分,提升一个忽略 "="
和 "in"
的函数.你可以做
equals_::Parser()equals_ [] = 错误equals_:预期 = 但输入结束"等于_ ("=":ss) = ((),ss)equals_ (s:ss) = error $ "equals_: expected = but got "++s
还有一些可以帮助解决这个问题.
实际上,pmap
也可以称为 liftP1
,但 map 是这种东西的传统名称.
用漂亮的组合器重写
现在我们准备好清理expr
:
expr :: [String] ->(表达式,[字符串])expr [] = 错误意外的输入结束"expr (s:ss) |all isDigit s = (E_Int (read s),ss)|s == "-" = pmap E_Neg expr ss|s == "+" = liftP2 E_Sum expr expr ss-- 更多案例
那一切正常.真的,没关系.但是 liftP5
会有点长,而且感觉很乱.
进一步清理 - 非常好的 Applicative 方式
应用函子是要走的路.记得我建议重构为
data Parser a = Par [String] ->(a,[字符串])
所以你可以让它成为Functor
的实例?也许你不想,因为你所获得的只是一个完美运行的 pmap
的新名称 fmap
和您必须处理所有那些使您的代码变得混乱的 Par
构造函数.不过,也许这会让你重新考虑;我们可以import Control.Applicative
,然后使用 data
声明,我们可以定义 <*>
,这意味着 then
并使用 <$>
而不是 pmap
>,带有*>
的意思<*>-but-forget-the-the-result-of-the-left-hand-side
所以你会写
expr (s:ss) |s == "let" = E_Let <$>变量 *>等于_<*>expr <*>在_ *>表达式
这看起来很像您的语法定义,因此很容易编写第一次运行的代码.这就是我喜欢编写解析器的方式.事实上,这就是我喜欢写很多东西的方式.你只需要定义 fmap
、<*>
和 pure
,都是简单的,不需要重复的 liftP3
、liftP4
等
阅读有关应用函子的信息.他们很棒.
使解析器适用的提示:pure
不会更改列表.<*>
和 liftP2
类似,但函数不是来自外部,它来自 p1
的输出.>
I have been searching around the interwebs for a couple of days, trying to get an answer to my questions and i'm finally admitting defeat.
I have been given a grammar:
Dig ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Int ::= Dig | Dig Int
Var ::= a | b | ... z | A | B | C | ... | Z
Expr ::= Int | - Expr | + Expr Expr | * Expr Expr | Var | let Var = Expr in Expr
And i have been told to parse, evaluate and print expressions using this grammar
where the operators * + -
has their normal meaning
The specific task is to write a function parse :: String -> AST
that takes a string as input and returns an abstract syntax tree when the input is in the correct format (which i can asume it is).
I am told that i might need a suitable data type and that data type might need to derive from some other classes.
Following an example output
data AST = Leaf Int | Sum AST AST | Min AST | ...
Further more, i should consider writing a function
tokens::String -> [String]
to split the input string into a list of tokens
Parsing should be accomplished with
ast::[String] -> (AST,[String])
where the input is a list of tokens and it outputs an AST, and to parse sub-expressions i should simply use the ast function recursively.
I should also make a printExpr method to print the result so that
printE: AST -> String
printE(parse "* 5 5")
yields either "5*5"
or "(5*5)"
and also a function to evaluate the expression
evali :: AST -> Int
I would just like to be pointed in the right direction of where i might start. I have little knowledge of Haskell and FP in general and trying to solve this task i made some string handling function out of Java which made me realize that i'm way off track.
So a little pointer in the right direction, and maybe an explantion to 'how' the AST should look like
Third day in a row and still no running code, i really appreciate any attempt to help me find a solution!
Thanks in advance!
Edit
I might have been unclear: I'm wondering how i should go about from having read and tokenized an input string to making an AST.
Parsing tokens into an Abstract Syntax Tree
OK, let's take your grammar
Dig ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Int ::= Dig | Dig Int
Var ::= a | b | ... z | A | B | C | ... | Z
Expr ::= Int | - Expr | + Expr Expr | * Expr Expr | Var | let Var = Expr in Expr
This is a nice easy grammar, because you can tell from the first token what sort of epression it will be.
(If there was something more complicated, like +
coming between numbers, or -
being used for subtraction as
well as negation, you'd need the list-of-successes trick, explained in
Functional Parsers.)
Let's have some sample raw input:
rawinput = "- 6 + 45 let x = - 5 in * x x"
Which I understand from the grammar represents "(- 6 (+ 45 (let x=-5 in (* x x))))"
,
and I'll assume you tokenised it as
tokenised_input' = ["-","6","+","4","5","let","x","=","-","5","in","*","x","x"]
which fits the grammar, but you might well have got
tokenised_input = ["-","6","+","45","let","x","=","-","5","in","*","x","x"]
which fits your sample AST
better. I think it's good practice to name your AST after bits of your grammar,
so I'm going to go ahead and replace
data AST = Leaf Int | Sum AST AST | Min AST | ...
with
data Expr = E_Int Int | E_Neg Expr | E_Sum Expr Expr | E_Prod Expr Expr | E_Var Char
| E_Let {letvar::Char,letequal:: Expr,letin::Expr}
deriving Show
I've named the bits of an E_Let
to make it clearer what they represent.
Writing a parsing function
You could use isDigit
by adding import Data.Char (isDigit)
to help out:
expr :: [String] -> (Expr,[String])
expr [] = error "unexpected end of input"
expr (s:ss) | all isDigit s = (E_Int (read s),ss)
| s == "-" = let (e,ss') = expr ss in (E_Neg e,ss')
| s == "+" = (E_Sum e e',ss'') where
(e,ss') = expr ss
(e',ss'') = expr ss'
-- more cases
Yikes! Too many let clauses obscuring the meaning,
and we'll be writing the same code for E_Prod
and very much worse for E_Let
.
Let's get this sorted out!
The standard way of dealing with this is to write some combinators;
instead of tiresomely threading the input [String]
s through our definition, define ways to
mess with the output of parsers (map) and combine
multiple parsers into one (lift).
Clean up the code 1: map
First we should define pmap
, our own equivalent of the map
function so we can do pmap E_Neg (expr1 ss)
instead of let (e,ss') = expr1 ss in (E_Neg e,ss')
pmap :: (a -> b) -> ([String] -> (a,[String])) -> ([String] -> (b,[String]))
nonono, I can't even read that! We need a type synonym:
type Parser a = [String] -> (a,[String])
pmap :: (a -> b) -> Parser a -> Parser b
pmap f p = ss -> let (a,ss') = p ss
in (f a,ss')
But really this would be better if I did
data Parser a = Par [String] -> (a,[String])
so I could do
instance Functor Parser where
fmap f (Par p) = Par (pmap f p)
I'll leave that for you to figure out if you fancy.
Clean up the code 2: combining two parsers
We also need to deal with the situation when we have two parsers to run, and we want to combine their results using a function. This is called lifting the function to parsers.
liftP2 :: (a -> b -> c) -> Parser a -> Parser b -> Parser c
liftP2 f p1 p2 = ss0 -> let
(a,ss1) = p1 ss0
(b,ss2) = p2 ss1
in (f a b,ss2)
or maybe even three parsers:
liftP3 :: (a -> b -> c -> d) -> Parser a -> Parser b -> Parser c -> Parser d
I'll let you think how to do that.
In the let statement you'll need liftP5
to parse the sections of a let statement,
lifting a function that ignores the "="
and "in"
. You could make
equals_ :: Parser ()
equals_ [] = error "equals_: expected = but got end of input"
equals_ ("=":ss) = ((),ss)
equals_ (s:ss) = error $ "equals_: expected = but got "++s
and a couple more to help out with this.
Actually, pmap
could also be called liftP1
, but map is the traditional name for that sort of thing.
Rewritten with the nice combinators
Now we're ready to clean up expr
:
expr :: [String] -> (Expr,[String])
expr [] = error "unexpected end of input"
expr (s:ss) | all isDigit s = (E_Int (read s),ss)
| s == "-" = pmap E_Neg expr ss
| s == "+" = liftP2 E_Sum expr expr ss
-- more cases
That'd all work fine. Really, it's OK. But liftP5
is going to be a bit long, and feels messy.
Taking the cleanup further - the ultra-nice Applicative way
Applicative Functors is the way to go. Remember I suggested refactoring as
data Parser a = Par [String] -> (a,[String])
so you could make it an instance of Functor
? Perhaps you don't want to,
because all you've gained is a new name fmap
for the perfectly working pmap
and
you have to deal with all those Par
constructors cluttering up your code.
Perhaps this will make you reconsider, though; we can import Control.Applicative
,
then using the data
declaration, we can
define <*>
, which sort-of means then
and use <$>
instead of pmap
, with *>
meaning
<*>-but-forget-the-result-of-the-left-hand-side
so you would write
expr (s:ss) | s == "let" = E_Let <$> var *> equals_ <*> expr <*> in_ *> expr
Which looks a lot like your grammar definition, so it's easy to write code that works first time.
This is how I like to write Parsers. In fact, it's how I like to write an awful lot of things.
You'd only have to define fmap
, <*>
and pure
, all simple ones, and no long repetative liftP3
, liftP4
etc.
Read up about Applicative Functors. They're great.
Hints for making Parser applicative: pure
doesn't change the list.
<*>
is like liftP2
, but the function doesn't come from outside, it comes as the output from p1
.
这篇关于Haskell 解析器到 AST 数据类型,赋值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!