使用Parsec解析非二进制运算符 [英] Parsing non binary operators with Parsec

查看:51
本文介绍了使用Parsec解析非二进制运算符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

传统上,算术运算符被认为是二进制运算符(左或右关联),因此大多数工具仅处理二进制运算符.

Traditionally, arithmetic operators are considered to be binary (left or right associative), thus most tools are dealing only with binary operators.

有没有一种简便的方法可以使用Parsec解析算术运算符,Parsec可以具有任意数量的参数?

Is there an easy way to parse arithmetic operators with Parsec, which can have an arbitrary number of arguments?

例如,以下表达式应解析到树中

For example, the following expression should be parsed into the tree

(a + b) + c + d * e + f

推荐答案

是的!关键是首先要解决一个更简单的问题,那就是将 + * 建模为只有两个孩子的树节点.要添加四件事,我们将使用 + 三次.

Yes! The key is to first solve a simpler problem, which is to model + and * as tree nodes with only two children. To add four things, we'll just use + three times.

这是一个很重要的问题,因为有一个 Text.Parsec.Expr 模块可以解决此问题.您的示例实际上可以通过示例代码中的示例进行解析文档.我在这里做了一些简化:

This is a great problem to solve since there's a Text.Parsec.Expr module for just this problem. Your example is actually parseable by the example code in the documentation. I've slightly simplified it here:

module Lib where

import Text.Parsec
import Text.Parsec.Language
import qualified Text.Parsec.Expr as Expr
import qualified Text.Parsec.Token as Tokens

data Expr =
    Identifier String
  | Multiply Expr Expr
  | Add Expr Expr

instance Show Expr where
  show (Identifier s) = s
  show (Multiply l r) = "(* " ++ (show l) ++ " " ++ (show r) ++ ")"
  show (Add l r) = "(+ " ++ (show l) ++ " " ++ (show r) ++ ")"

-- Some sane parser combinators that we can plagiarize from the Haskell parser.
parens = Tokens.parens haskell
identifier = Tokens.identifier haskell
reserved = Tokens.reservedOp haskell

-- Infix parser.
infix_ operator func =
  Expr.Infix (reserved operator >> return func) Expr.AssocLeft

parser =
  Expr.buildExpressionParser table term <?> "expression"
  where
    table = [[infix_ "*" Multiply], [infix_ "+" Add]]

term =
  parens parser
  <|> (Identifier <$> identifier)
  <?> "term"

在GHCi中运行此程序

Running this in GHCi:

λ> runParser parser () "" "(a + b) + c + d * e + f"
Right (+ (+ (+ (+ a b) c) (* d e)) f)

有很多方法可以将此树转换为所需的形式.这是一个骇人听闻的速度缓慢的东西:

There are lots of ways of converting this tree to the desired form. Here's a hacky gross slow one:

data Expr' =
    Identifier' String
  | Add' [Expr']
  | Multiply' [Expr']
  deriving (Show)

collect :: Expr -> (Expr -> Bool) -> [Expr]
collect e f | (f e == False) = [e]
collect e@(Add l r) f =
  collect l f ++ collect r f
collect e@(Multiply l r) f =
  collect l f ++ collect r f

isAdd :: Expr -> Bool
isAdd (Add _ _) = True
isAdd _ = False

isMultiply :: Expr -> Bool
isMultiply (Multiply _ _) = True
isMultiply _ = False

optimize :: Expr -> Expr'
optimize (Identifier s) = Identifier' s
optimize e@(Add _ _) = Add' (map optimize (collect e isAdd))
optimize e@(Multiply _ _) = Multiply' (map optimize (collect e isMultiply))

但是,我要指出的是,出于解析器或编译器的目的,几乎总是 Expr 是Good Enough™.

I will note, however, that almost always Expr is Good Enough™ for the purposes of a parser or compiler.

这篇关于使用Parsec解析非二进制运算符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆