Haskell Parsec - 错误消息在使用自定义令牌时不太有用 [英] Haskell Parsec - error messages are less helpful while using custom tokens
问题描述
我正在分析解析器的lexing和解析阶段。经过一些测试后,我发现当我使用Parsec的Char标记以外的其他标记时,错误消息的帮助不大。
以下是使用Char时Parsec错误消息的一些示例代币:
ghci> P.parseTest(字符串asdf>>空格>>字符串ok)asdf错误
解析错误(第1行,第7列):
意外w
期待空间或ok
ghci> P.parseTest(选择[字符串ok,字符串nop])错误
解析错误(第1行,第1列):
意外w
期待ok 或nop
所以,字符串解析器显示发现意外字符串时预期的字符串,选择解析器显示什么是选择。
但是当我使用与我的令牌相同的组合符号时:
ghci的> Parser.parseTest((tok $ Ideasdf)>>(tok $ Ideok))asdf
在test(第1行,第1列)解析错误:
意外的输入结束
在这种情况下,它不会打印出预期结果。
ghci> Parser.parseTest(选择[tok $ Ideok,tok $ Idenop])asdf
解析错误(第1行,第1列):
意外(Ideasdf, test(line 1,column 1))
当我使用
我期望这种行为与combinator函数有关,而不是与令牌相关,但似乎是我错了。我该如何解决这个问题?
下面是完整的词法分析器+解析器代码:
Lexer:
$ p $
module Lexer
(Token(..)
,TokenPos(..)
,tokenize
import Text.ParserCombinators.Parsec hiding(token,tokens)
import Control.Applicative((< *),(*>),(< $> ;),(*))
数据Token = Ide String
|数字字符串
|布尔字符串
| LBrack
| RBrack
| LBrace
| RBrace
|关键字字符串
deriving(Show,Eq)
类型TokenPos =(Token,SourcePos)
ide :: Parser TokenPos
ide = do
pos< - getPosition
fc< - oneOf firstChar
r< - optionMaybe(许多$ oneOf rest)
spaces
return $ flip(,)pos $ case r
无 - > Ide [fc]
只是s - > Ide $ [fc] ++ s
where firstChar = ['A'..'Z'] ++ ['a'..'z'] ++_
rest = firstChar + + ['0'..'9']
parsePos p =(,)< $> p * getPosition
lbrack = parsePos $ char'['>>返回LBrack
rbrack = parsePos $ char']'>>返回RBrack
lbrace = parsePos $ char'{'>>返回LBrace
rbrace = parsePos $ char'}'>>返回RBrace
令牌=选择
[
,lbrack
,rbrack
,lbrace
,rbrace
]
tokens =空格*>许多(令牌< *空格)
tokenize :: SourceName - >字符串 - > ParseError [TokenPos]
tokenize = runParser tokens()
解析器:
模块解析器其中
将Text.Parsec作为P
导入Control.Monad.Identity
导入Lexer
parseTest :: Show a => Parsec [TokenPos]()a - >字符串 - > IO()
parseTest p s =
case tokenizetestof
Left e - > putStrLn $ show e
Right ts' - > P.parseTest p''
tok :: Token - > ParsecT [TokenPos]()身份令牌
tok t =令牌show snd test
其中test(t',_)= case t == t'of
False - > Nothing
True - >只要t
解决方案:
在fp4me的回答和仔细阅读Parsec的Char源代码之后,我最终得到了这个结论:
{ - #LANGUAGE FlexibleContexts# - }
模块解析器其中
导入Text.Parsec作为P
导入Control.Monad.Identity
导入Lexer
parseTest ::显示a = > Parsec [TokenPos]()a - >字符串 - > IO()
parseTest p s =
case tokenizetestof
Left e - > putStrLn $ show e
Right ts' - > P.parseTest p''
类型Parser a = Parsec [TokenPos]()a
advance :: SourcePos - > t - > [TokenPos] - > SourcePos
advance _ _((_,pos):_)= pos
advance pos _ [] = pos
满足::(TokenPos - > Bool) - > ;解析器令牌
满足f = tokenPrim show
advance
(\c - > if fc then Just(fst c)else Nothing)
tok :: Token - > ParsecT [TokenPos]()身份令牌
tok t =(Parser.satisfy $(== t).fst)<>显示t
现在我收到相同的错误消息:
ghci> Parser.parseTest(选择[tok $ Ideok,tok $nop])asdf
解析错误at(line 1,第1列):
意外(Ideasdf,test(第1行,第3列))
期待Ideok或Idenop p>
解决方案的开始可以是在解析器中定义您的选择函数
使用特定的意外函数来覆盖意外错误,最后
使用<?>
运算符来覆盖期望消息:
mychoice [] = mzero
mychoice(x:[])=(tok x< |> myUnexpected)<?>显示x
mychoice(x:xs)=((tok x< |> mychoice xs)< |> myUnexpected)<?> show(x:xs)
myUnexpected = do
input< - getInput
unexpected $(id $ first input)
where
first [] =eof
first(x:xs)= show $ fst x
你的解析器是这样的:
ghci> Parser.parseTest(mychoice [Ideok,Idenop])asdf
解析错误(第1行,第1列):
意外Ideasdf
期待[ Ideok,Idenop]
I'm working on seperating lexing and parsing stages of a parser. After some tests, I realized error messages are less helpful when I'm using some tokens other than Parsec's Char tokens.
Here are some examples of Parsec's error messages while using Char tokens:
ghci> P.parseTest (string "asdf" >> spaces >> string "ok") "asdf wrong"
parse error at (line 1, column 7):
unexpected "w"
expecting space or "ok"
ghci> P.parseTest (choice [string "ok", string "nop"]) "wrong"
parse error at (line 1, column 1):
unexpected "w"
expecting "ok" or "nop"
So, string parser shows what string is expected when found an unexpected string, and choice parser shows what are alternatives.
But when I use same combinators with my tokens:
ghci> Parser.parseTest ((tok $ Ide "asdf") >> (tok $ Ide "ok")) "asdf "
parse error at "test" (line 1, column 1):
unexpected end of input
In this case, it doesn't print what was expected.
ghci> Parser.parseTest (choice [tok $ Ide "ok", tok $ Ide "nop"]) "asdf "
parse error at (line 1, column 1):
unexpected (Ide "asdf","test" (line 1, column 1))
And when I use choice
, it doesn't print alternatives.
I expect this behavior to be related with combinator functions, and not with tokens, but seems like I'm wrong. How can I fix this?
Here's the full lexer + parser code:
Lexer:
module Lexer
( Token(..)
, TokenPos(..)
, tokenize
) where
import Text.ParserCombinators.Parsec hiding (token, tokens)
import Control.Applicative ((<*), (*>), (<$>), (<*>))
data Token = Ide String
| Number String
| Bool String
| LBrack
| RBrack
| LBrace
| RBrace
| Keyword String
deriving (Show, Eq)
type TokenPos = (Token, SourcePos)
ide :: Parser TokenPos
ide = do
pos <- getPosition
fc <- oneOf firstChar
r <- optionMaybe (many $ oneOf rest)
spaces
return $ flip (,) pos $ case r of
Nothing -> Ide [fc]
Just s -> Ide $ [fc] ++ s
where firstChar = ['A'..'Z'] ++ ['a'..'z'] ++ "_"
rest = firstChar ++ ['0'..'9']
parsePos p = (,) <$> p <*> getPosition
lbrack = parsePos $ char '[' >> return LBrack
rbrack = parsePos $ char ']' >> return RBrack
lbrace = parsePos $ char '{' >> return LBrace
rbrace = parsePos $ char '}' >> return RBrace
token = choice
[ ide
, lbrack
, rbrack
, lbrace
, rbrace
]
tokens = spaces *> many (token <* spaces)
tokenize :: SourceName -> String -> Either ParseError [TokenPos]
tokenize = runParser tokens ()
Parser:
module Parser where
import Text.Parsec as P
import Control.Monad.Identity
import Lexer
parseTest :: Show a => Parsec [TokenPos] () a -> String -> IO ()
parseTest p s =
case tokenize "test" s of
Left e -> putStrLn $ show e
Right ts' -> P.parseTest p ts'
tok :: Token -> ParsecT [TokenPos] () Identity Token
tok t = token show snd test
where test (t', _) = case t == t' of
False -> Nothing
True -> Just t
SOLUTION:
Ok, after fp4me's answer and reading Parsec's Char source more carefully, I ended up with this:
{-# LANGUAGE FlexibleContexts #-}
module Parser where
import Text.Parsec as P
import Control.Monad.Identity
import Lexer
parseTest :: Show a => Parsec [TokenPos] () a -> String -> IO ()
parseTest p s =
case tokenize "test" s of
Left e -> putStrLn $ show e
Right ts' -> P.parseTest p ts'
type Parser a = Parsec [TokenPos] () a
advance :: SourcePos -> t -> [TokenPos] -> SourcePos
advance _ _ ((_, pos) : _) = pos
advance pos _ [] = pos
satisfy :: (TokenPos -> Bool) -> Parser Token
satisfy f = tokenPrim show
advance
(\c -> if f c then Just (fst c) else Nothing)
tok :: Token -> ParsecT [TokenPos] () Identity Token
tok t = (Parser.satisfy $ (== t) . fst) <?> show t
Now I'm getting same error messages:
ghci> Parser.parseTest (choice [tok $ Ide "ok", tok $ Ide "nop"]) " asdf"
parse error at (line 1, column 1):
unexpected (Ide "asdf","test" (line 1, column 3))
expecting Ide "ok" or Ide "nop"
A beginning of solution can be to define your choice function in the Parser,
use a specific unexpected function to override unexpected error and finally
use the <?>
operator to override the expecting message:
mychoice [] = mzero
mychoice (x:[]) = (tok x <|> myUnexpected) <?> show x
mychoice (x:xs) = ((tok x <|> mychoice xs) <|> myUnexpected) <?> show (x:xs)
myUnexpected = do
input <- getInput
unexpected $ (id $ first input )
where
first [] = "eof"
first (x:xs) = show $ fst x
and call your parser like that :
ghci> Parser.parseTest (mychoice [Ide "ok", Ide "nop"]) "asdf "
parse error at (line 1, column 1):
unexpected Ide "asdf"
expecting [Ide "ok",Ide "nop"]
这篇关于Haskell Parsec - 错误消息在使用自定义令牌时不太有用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!