如何在Haskell中实现词法分析器和解析器 [英] how to implement lexical analyser and parser in Haskell

查看:234
本文介绍了如何在Haskell中实现词法分析器和解析器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这里有一段代码,它是一个用Haskell编写的命令式编程语言编写的程序,所以问题是如何为这种语言实现词法分析器和解析器,程序被定义为一系列包含6种类型的语句:= =,goto,write,stop,if goto和int


  1. int n = 5

  2. 写入n

  3. int fac = 1
  4. > if0 n goto 8
  5. fac:= fac * n

  6. n:= n-1

  7. $ b

    我有点迷失在这里,我已经阅读过词法分析器和解析器,但没有找到任何示例来实现它们,如果能够给我一段代码,我会很感激,这样我就可以自己尝试去做,或者至少有用的信息链接

    解决方案

    开始使用Parsec解析



    我不会为你写整件事,但我会得到你你开始每一点。我们将经历的三个阶段是:


    1. 定义语法
    2. 创建抽象语法树。 (这看起来像语法,所以会很容易。)

    3. 编写一个解析器。 (这也看起来像语法,所以很容易。)

    我们可以在2到3之间做一个单独的lexing阶段,但Parsec很乐意做这两个级别。 Lexing是将输入分成令牌的位置 - 有意义的输入位 - 相当于人类语言中的单词,也称为词位。跳过一个单独的lexing阶段意味着我们需要更清楚地了解空白等。

    1。语法



    首先您需要定义语法。

      program :: = line {[newline line]}最好用纸和铅笔做完,但我会让你开始: 
    line :: = num dot空白语句
    语句:: = declaration |写| | ifstatement | goto |分配| stop
    declaration =Intwhitespace varname equals int
    varname = letter {[alphanum]}
    - 这里有更多的东西,包括更有趣的if语句:
    ifstatement =if whitespace表达式空白等于表达式空白语句


    - 低级别的东西:
    dot =。
    int = digit {[digit]}
    digit =0| 1| ... | 9
    whitespace =okspace | \tokspace
    okspace =|空白

    想想如何与您的示例程序匹配,并考虑如何完成它off:

      1。 Int n = 5 
    2.写n
    3. Int fac = 1
    4. if 0 n goto 8 - 不寻常
    5. fac:= fac * n
    6. n:= n + 1 - complex
    7. goto 4
    8. write fac
    9. stop
    = 或 ==



    / code>在里面。也许这是为了简化语法,并且它只能接受单个变量或整数,并且有一个空格。也许这是一个错字,你的意思是有一个等号和任意表达式。找出并重写语法中的 ifstatement 部分。



    第6行中的赋值很复杂,因为这里你必须解析任意的算术表达式。据我记得有大量的例子,所以我现在很乐意跳过。如果你遇到那样的问题,请再说一次,但希望你已经建立了解析技巧,其余部分先解决。

    2。抽象语法树(AST)



    抽象语法树表示组成输入的标记组合。在Haskell中,我们可以定义我们自己以适应上下文,这会让生活变得更简单。 $ b

    我其实编译这个答案(一种检查错字等的好方法),它是为什么我需要在代码顶部的一些声明:

      module ParseLang where 

    import Text .Parsec hiding(Line)
    import Text.Parsec.String
    import Control.Applicative hiding((< |>),many)

    我们将制作程序列表 s,但强制解析器必须至少有一个。

      type Program = [Line] 

    对于,它需要一个数字,并且一个声明,但点只是我们不需要存储的语法。我们可以将行号存储为 Int ,因此尽管类型声明中允许使用负数,解析器也不会接受负数。

      data Line =行{aNum :: Int,aStatement :: Statement} 
    派生显示

    多个选项很容易定义:

      data语句=声明VarName Int 
    |写VarName
    | IfStatement表达式表达式
    |转到LineNumber
    |赋值VarName表达式
    |停止
    派生显示

    请注意,没有使用所有语法cruft / connectives /等号只是变化的一点。



    我在那里停下 - 你可以完成:

      data表达式=表达式 - 我已经为你留下了这个
    派生显示

    类型VarName =字符串 - 可以使用新类型安全类型为
    type LineNumber = Int

    较低级别的语法不需要在AST中表示,将使用字符串。



    3。解析器



    这一点现在很好,很容易。

      num :: Parser Int 
    num = read< ; $>许多数字

    我们使用< $> ,它是通过导入 Control.Applicative 得到的 fmap 的同义词。在这里,它改变了解析器使用左侧纯函数返回的值,在这种情况下, read 。查看这个其他答案,了解 fmap



    让我们创建一个解析器来解析字符串,然后解析一些空格:

      whitespace = space>>空格 - 空格然后可选空格

    lit :: String - >解析器字符串
    lit xs = string xs <* whitespace

    现在< * 很有趣。它看起来像<> ,它实际上组合了两个解析器,并与< $> 实际上将纯函数映射到结果上。 *> < * 结合了两个解析器,但忽略了其中一个解析器的输出,所以 stringgoto< * whitespace 解析了一个goto和一些空格,但是抛弃了空格。



    现在我们准备解析goto语句:

      goto :: Parser声明
    goto =转到< $> (点亮goto*> num)

    现在我们来看看varName

      varName :: Parser VarName 
    varName =(:)< $>字母< *>很多(alphaNum< |> oneOf'_)

    有几件事正在进行那里。

    1 < |> 是另一种选择 - 一种或另一种,所以(alphaNum< |> oneOf'_)接受一个字母数字字符或这对无辜字符' _ 你可能想要包含在一个变量名中。

    2。 f <$>解析器1 * parser2 是结合解析器的一种非常好的方式。它运行parser1,然后运行parser2,然后将函数 f 映射到它们生成的结果f上。它适用于很多解析器:

       -  ifstatement =ifwhitespace表达式whitespace等于空白表达式空白语句
    ifStatement ::解析器语句
    ifstatement = IfStatement
    < $> (如果>>>表达,则点亮)
    < *> (点亮=>>表达式) - 我put = in,但在下面看到
    < *> (空白>>语句) - 我会更快乐,然后在这里

    如果你只允许一个 VarName Int 而不是一般表达式,那么你不需要等号。



    以下是你如何把它放在一起:

     语句::解析器语句
    语句=转到
    < |>停止
    < |>声明
    < |>写
    < |> ifStatement
    < |>赋值


    - 程序:: =行{[新行]]
    程序::解析程序
    程序=多行1行


    --line :: = num dot空白语句
    行::解析器行
    行=行<$> (num< * char'。')< *> (语句< * char'\\\
    ')

    但是我会给你留下错误信息每次尝试使用尚未完成的解析器时,整个过程都会编译成OK,并且您定义的位应该可以工作。

      stop = error您还没有定义停止
    declaration = error您尚未定义声明
    write = error您尚未定义写入
    ifStatement =错误您尚未定义ifStatement
    赋值=错误您尚未定义赋值
    表达式=错误您尚未定义表达式


    I got this piece of code here, it is a program written in an imperative programming language structured in Haskell, so the question is "how can I implement a lexer analyser and parser for this language", the program is defined to be a sequence of statements of which there are 6 types: ":=", "goto", "write", "stop", "if goto" and "int"

    1. int n=5
    2. write n
    3. int fac=1
    4. if0 n goto 8
    5. fac := fac * n
    6. n := n-1
    7. goto 4
    8. write fac
    9. stop

    I'm kinda lost here, I've read about lexers and parsers, but did not find any example how to implement them,I will appreciate if you could give me piece of code, so that I could try to do it myself, or at least links with useful info

    解决方案

    Getting Started Parsing with Parsec

    I'll not write the whole thing for you, but I'll get you started with each bit. The three stages we'll go through are:

    1. Define the grammar
    2. Make an Abstract Syntax Tree. (This will look like the grammar, so will be quite easy.)
    3. Write a parser. (This will also look like the grammar, so will be quite easy.)

    We could make a separate lexing stage between 2 and 3, but Parsec is happy to do both levels. Lexing is where you split the input into tokens - bits of the input that make sense - the equivalent of words in human language, which are also known as lexemes. Skipping a separate lexing phase means we need to be a little more explicit about whitespace etc.

    1. Grammar

    First you need to define the grammar. Best done with paper and pencil, but I'll get you started:

      program ::= line {[newline line]}
      line ::= num dot whitespace statement
      statement ::= declaration | write | ifstatement | goto | assignment | stop
      declaration = "Int" whitespace varname equals int
      varname = letter {[alphanum]}
      -- more things here, including the more interesting ifstatement:
      ifstatement = "if" whitespace expression whitespace equals expression whitespace statement
    
    
      -- lower level stuff:
      dot = "."
      int = digit {[digit]}
      digit = "0" | "1" | ...  | "9"
      whitespace = " " okspace | "\t" okspace
      okspace = "" | whitespace
    

    Have a think about how that matches your sample program, and think about how you'll finish it off:

    1. Int n=5
    2. write n
    3. Int fac=1
    4. if 0 n goto 8          -- unusual
    5. fac := fac * n
    6. n := n+1               -- complex
    7. goto 4
    8. write fac
    9. stop
    

    The if statement in line 4 is unusual because there's no = or == in it. Perhaps that's to simplify the grammar and it can only accept single variables or integers, with a space between. Perhaps that's a typo and you mean to have an equals sign and arbitrary expressions. Find out which and rewrite the ifstatement part of the grammar.

    The assignment in line 6 is complex, because here you have to parse arbitrary arithmetic expressions. As far as I recall there are loads of examples of that knocking about, so I'll gladly skip that for now. Make it another question if you get stuck with that bit, but hopefully you'll have built up your parsing skills with the rest of it first.

    2. Abstract Syntax Tree (AST)

    An Abstract Syntax Tree represents the combination of tokens that make up your input. In Haskell we can define our own to suit the context, which will make life much simpler

    I'm actually compiling this answer (a good way of checking for typos etc), which is why I need some declarations at the top of the code:

    module ParseLang where
    
    import Text.Parsec hiding (Line)
    import Text.Parsec.String
    import Control.Applicative hiding ((<|>), many)
    

    We'll just make a Program a list of Lines, but enforce with the parser that there has to be at least one.

    type Program = [Line]
    

    For a Line, it needs a number, and a statement, but the dot is just syntax we don't need to store. We can store the line number as an Int, so although that allows negative numbers in the type declaration, again the parser won't accept negative.

    data Line = Line {aNum::Int, aStatement :: Statement} 
       deriving Show
    

    Multiple options are easy to define:

    data Statement = Declaration VarName Int
                   | Write VarName
                   | IfStatement Expression Expression Statement
                   | Goto LineNumber
                   | Assignment VarName Expression
                   | Stop
       deriving Show
    

    Notice the absence of all the syntax cruft/connectives/equal signs leaving just the bits that change.

    I'm stopping there - you can finish:

    data Expression = Expression -- I've left this one for you
       deriving Show
    
    type VarName = String   -- could use newtype for type safety for these to
    type LineNumber = Int   
    

    The lower level syntax doesn't need representing in the AST because we'll use strings for it.

    3. The Parser

    This bit is now nice and easy. Let's start at the bottom of the syntax tree and work up.

    num :: Parser Int
    num = read <$> many digit
    

    We've used <$>, which is a synonym for fmap that we got by importing Control.Applicative. Here it alters the values returned by the parser using the pure function on the left, in this case, read. Have a look at this other answer for an introduction to fmap if you're not used to it.

    Let's make a parser that parses a string literal then some whitespace:

    whitespace = space >> spaces  -- a space then optional spaces
    
    lit :: String -> Parser String
    lit xs = string xs <* whitespace
    

    Now that <* is interesting. It looks like <*>, which in effect combines two parsers, and is used in conjunction with <$> which in effect maps a pure function onto the result. *> and <* combine two parsers, but ignore the output of one of them, so string "goto" <* whitespace parses a "goto" and some whitespace, but throws away the whitespace.

    Now we're ready to parse a goto statement:

    goto :: Parser Statement
    goto = Goto <$> (lit "goto" *> num)
    

    Now let's have a go at varName

    varName :: Parser VarName
    varName = (:) <$> letter <*> many (alphaNum <|> oneOf "'_")
    

    A couple of things are going on there.

    1. <|> is alternative choice - one or the other, so (alphaNum <|> oneOf "'_") accepts an alphanumeric character or one of that pair of innocent characters ' and _ you might want to include in a variable name.

    2. f <$> parser1 <*> parser2 is a really nice way of combining parsers. It runs parser1, then parser2, then maps the function f over the results f that they produced. It works with many parsers:

    --ifstatement = "if" whitespace expression whitespace equals whitespace expression whitespace statement
    ifStatement :: Parser Statement
    ifstatement = IfStatement 
              <$> (lit "if" >> expression) 
              <*> (lit "=" >> expression)    -- I put = in, but see below
              <*> (whitespace >> statement)  -- I'd be happier with a then in here
    

    If you only allow a VarName or Int instead of a general Expression, then you don't need the equals sign.

    Here's how you put it together:

    statement :: Parser Statement
    statement = goto
            <|> stop
            <|> declaration 
            <|> write
            <|> ifStatement
            <|> assignment
    
    
    --program ::= line {[newline line]}
    program :: Parser Program
    program = many1 line
    
    
    --line ::= num dot whitespace statement
    line :: Parser Line
    line = Line <$> (num <* char '.') <*> (statement <* char '\n')
    

    But I'll leave you with error messages every time you try to use a parser you've not finished yet, so the whole thing will compile OK and the bits you've defined should work.

    stop =        error "You've not yet defined stop"
    declaration = error "You've not yet defined declaration"
    write =       error "You've not yet defined write"
    ifStatement = error "You've not yet defined ifStatement"
    assignment =  error "You've not yet defined assignment"
    expression =  error "You've not yet defined expression"
    

    这篇关于如何在Haskell中实现词法分析器和解析器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆