在Haskell中从头开始编写解析器 [英] Writing a parser from scratch in Haskell
问题描述
我发现:
$ b $有没有很好的教程,可以从头开始为Haskell中的给定语法编写解析器? b
但所有这些都使用parsec库,并且虽然这对工业应用程序可能很有意思,但我特别寻找不使用复杂库的'自下而上'的例子。
我发现了唯一一个使用'基本'Haskell是这样的:用Haskell解析
使用一些非常外语的语法(很难区分程序的什么部分或什么是'伪代码')和没有明确的语法定义。
任何建议都非常感谢!
从零开始构建Parsec实际上非常容易。实际的库代码本身是大量泛化和优化的,它扭曲了核心抽象,但如果你只是从头开始构建一些东西来了解更多关于发生的事情,你可以用几行代码来编写它。我将在下面构建一个稍微弱一点的 Applicative
解析器。 实际上,我们想要生成一个 Applicative
, Parser
以及一个原始解析器值
satisfy ::(Char - > Bool) - >解析器Char
和一些组合器,如 try
,如果失败则解除解析器
try :: Parser a - >解析器a
和 orElse
这可以让我们如果第一个解析器失败,则继续使用第二个解析器。通常这实际上是用中缀组合器(< |>)
orElse,(< |>):: Parser a - >解析器a - >解析器a
由于我们的 Applicative
需要保留跟踪当前流状态并能够失败,我们将通过组合状态 Applicative
和 Either
应用。
type Error = String
newtype解析器a = P {unP :: String - > (字符串,或者错误a)}
实例Functor解析器其中
fmap f(P st)= P $ \stream - >案例st
(res,Left err) - > (res,Left err)
(res,Right a) - > (res,Right(fa))
实例适用解析器其中
pure a = P(\ stream - >(stream,Right a))
P ff< * GT; P xx = P $ \stream0 - >产生f
(stream1,Left err) - >的情况。 (stream1,Left err)
(stream1,Right f) - > case xx stream1 of - 产生一个x
(stream2,Left err) - > (stream2,Left err)
(stream2,Right x) - > (stream2,Right(fx)) - return(fx)
如果我们遵循<$ c在 Applicative
实例中的$ c>(< *>)方法中,我们仔细地看到它只是将流传入 f
--producing Parser
,接受结果流,如果成功,将它传递给 x
-producing Parser
,如果它们都成功,它将返回它们的应用程序(fx)
。这意味着如果我们有函数生成分析器和参数生成分析器,我们可以用(<>)
- 给出
parseChar :: Char - >解析器Char
$ b parseHi :: Parser(Char,Char) - 解析'h'然后解析'i'
parseHi = pure(,)< $> parseChar'h'*> parseChar'i'
我们可以使用 Applicative
来构建所需的组合器。这是满足
- |查看下一个字符并成功返回,如果它满足谓词
满足::(Char - > Bool) - >解析器Char
满足f = P $ \stream - >
[]的案例流 - >> ([],左边流)
(c:cs)| f c - > (cs,Right c)
|否则 - > (cs,Left不满足)
这里是 try
- |运行解析器,但如果失败则将流恢复为原始状态
try :: Parser a - >解析器a
try(P f)= P $ \stream0 - > case $ stream $
(_,Left err) - > (stream0,Left err)
(stream1,Right a) - > (stream1,Right a)
以下是 orElse
orElse :: Parser a - >解析器a - >解析器a
orElse(P f1)(P f2)= P $ \stream0 - >
(stream1,Left err)的情况f1 stream0 - > f2 stream1
(stream1,Right a) - > (stream1,Right a)
通常在这一点上我们还注意到 Parser如果我们还提供了一个立即失败的解析器,
就会用或者
替代
c $ c> empty
instance Alternative Parser其中
empty = P $ \流 - > (stream,Leftempty)
(< |>)= orElse
many = manyParser
some = someParser
我们可以写 manyParser
和 someParser
作为重复运行解析器的组合器。
- | 0或更多
manyParser :: Parser a - > Parser [a]
manyParser(P f)= P go其中
go stream = case f stream of
(_,Left err) - > (stream,Right []) - 抛出错误
(stream',Right a) - >案例转到'
(streamFin,Left err) - > (streamFin,Left err)
(streamFin,Right as) - > (streamFin,Right(a:as))
- | 1或更多
someParser :: Parser a - >解析器[a]
someParser(P f)= P $ \ stream - >案例f
(stream',Left err) - > (stream',Left err)
(stream',Right a) - >
let(P fmany)= manyParser(P f)
在
(stream'',Left err) - > (stream'',Left err)
(stream'',Right as) - > (stream'',Right(a:as))
从这里我们可以开始工作更高级别的抽象。
char :: Char - >解析器char
char c =满足(== c)
string :: String - >解析器字符串
string [] = pure []
string(c:cs)=(:)< $> char c *字符串cs
oneOf :: [Char] - >解析器Char
oneOf cs =满足(`elem` cs)
parens ::解析器a - >解析器a
parens parseA = dropFirstAndLast< $> char'('*'parseA *'char')'
其中
dropFirstAndLast _ a _ = a
Is there any good tutorial for writing a parser for a given grammar in Haskell from scratch?
I found:
but all of these use the parsec library, and while that may be interesting for industrial applications I'm specifically looking for examples that are 'bottom up' without the use of sophisticated libraries.
The only one I found which uses 'basic' Haskell is this: Parsing with Haskell which uses some very foreign syntax (it's very hard to distinguish whats part of the program or whats only 'pseudocode') and there's no explicit grammar definition.
Any suggestions are highly appreciated!
It's actually surprisingly easy to build Parsec-from-scratch. The actual library code itself is heavily generalized and optimized which contorts the core abstraction, but if you're just building things from scratch to understand more about what's going on you can write it in just a few lines of code. I'll build a slightly weaker Applicative
parser below.
Essentially, we want to produce a Applicative
, Parser
along with a primitive parser value
satisfy :: (Char -> Bool) -> Parser Char
and a few combinators such as try
, which "undoes" a parser if it fails
try :: Parser a -> Parser a
and orElse
which allows us to continue with a second parser if the first parser fails. Usually this is actually written with the infix combinator (<|>)
orElse, (<|>) :: Parser a -> Parser a -> Parser a
Since our Applicative
needs to keep track of the current stream state and be able to fail, we'll build it by combining the state Applicative
and the Either
applicative.
type Error = String
newtype Parser a = P { unP :: String -> (String, Either Error a) }
instance Functor Parser where
fmap f (P st) = P $ \stream -> case st stream of
(res, Left err) -> (res, Left err)
(res, Right a ) -> (res, Right (f a))
instance Applicative Parser where
pure a = P (\stream -> (stream, Right a))
P ff <*> P xx = P $ \stream0 -> case ff stream0 of -- produce an f
(stream1, Left err) -> (stream1, Left err)
(stream1, Right f ) -> case xx stream1 of -- produce an x
(stream2, Left err) -> (stream2, Left err)
(stream2, Right x ) -> (stream2, Right (f x)) -- return (f x)
If we follow the (<*>)
method in the Applicative
instance carefully we see that it just passes the stream into the f
-producing Parser
, takes the result stream and, if successful, passes it to the x
-producing Parser
and if they both succeed it returns their application (f x)
. This means that if we have a function-producing parser and an argument-producing parser we can sequence them with (<*>)
-- given
parseChar :: Char -> Parser Char
parseHi :: Parser (Char, Char) -- parses 'h' then 'i'
parseHi = pure (,) <$> parseChar 'h' <*> parseChar 'i'
We can use the mechanics of this Applicative
to build the needed combinators as well. Here's satisfy
-- | Peek at the next character and return successfully if it satisfies a predicate
satisfy :: (Char -> Bool) -> Parser Char
satisfy f = P $ \stream -> case stream of
[] -> ([], Left "end of stream")
(c:cs) | f c -> (cs, Right c)
| otherwise -> (cs, Left "did not satisfy")
And here's try
-- | Run a parser but if it fails revert the stream to it's original state
try :: Parser a -> Parser a
try (P f) = P $ \stream0 -> case f stream0 of
(_ , Left err) -> (stream0, Left err)
(stream1, Right a ) -> (stream1, Right a )
And here's orElse
orElse :: Parser a -> Parser a -> Parser a
orElse (P f1) (P f2) = P $ \stream0 -> case f1 stream0 of
(stream1, Left err) -> f2 stream1
(stream1, Right a ) -> (stream1, Right a)
Typically at this point we also note that Parser
forms an Alternative
instance with orElse
if we also provide an immediately failing parser, empty
instance Alternative Parser where
empty = P $ \stream -> (stream, Left "empty")
(<|>) = orElse
many = manyParser
some = someParser
And we can write manyParser
and someParser
as combinators which run a parser repeatedly.
-- | 0 or more
manyParser :: Parser a -> Parser [a]
manyParser (P f) = P go where
go stream = case f stream of
(_ , Left err) -> (stream, Right []) -- throws away the error
(stream', Right a ) -> case go stream' of
(streamFin, Left err) -> (streamFin, Left err)
(streamFin, Right as) -> (streamFin, Right (a : as))
-- | 1 or more
someParser :: Parser a -> Parser [a]
someParser (P f) = P $ \stream -> case f stream of
(stream', Left err) -> (stream', Left err)
(stream', Right a ) ->
let (P fmany) = manyParser (P f)
in case fmany stream' of
(stream'', Left err) -> (stream'', Left err)
(stream'', Right as) -> (stream'', Right (a:as))
And from here we can begin to work at much higher levels of abstraction.
char :: Char -> Parser Char
char c = satisfy (== c)
string :: String -> Parser String
string [] = pure []
string (c:cs) = (:) <$> char c <*> string cs
oneOf :: [Char] -> Parser Char
oneOf cs = satisfy (`elem` cs)
parens :: Parser a -> Parser a
parens parseA = dropFirstAndLast <$> char '(' <*> parseA <*> char ')'
where
dropFirstAndLast _ a _ = a
这篇关于在Haskell中从头开始编写解析器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!