使用Parsec解析数据并省略注释 [英] Parsing data with Parsec and omitting comments

查看:155
本文介绍了使用Parsec解析数据并省略注释的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

 我试图编写一个Haksell Parsec解析器来解析文件中的输入数据到LogLine数据类型中。 - 包含独立解析器的最终解析器。 
final :: Parser [LogLine]
final = do {logLines< - sepBy1 logLine eol
;返回logLines
}


- logline标记声明
logLine :: Parser LogLine
logLine = do
name< - plainValue - 解析名称(标识符)
many1空间 - 解析并丢弃一个空格
args1< - 括号中的值 - 解析第一个参数
many1空间 - 丢弃第二个sapce
args2< - bracketedValue - 解析第二个参数列表
many1空间 -
常量< - plainValue - 解析常量标识符
空间
加权< - plainValue - 分析加权双倍
空间
返回$ LogLine名称args1 args2常量加权

它解析一切正常,但现在我需要为文件添加注释,并且必须修改解析器以便忽略它们。
它应该支持以 - 开始并以'\ n'结尾的单行注释
我已经试过定义注释标记,如下所示:

  comments :: Parser String 
Comments = do
string -
comment< - (manyTill anyChar换行符)
return

然后将它插入 final 解析器就像这样:

  final :: Parser [LogLine] 
final = do
可选注释
logLines< - sepBy1 logLine(注释< |>换行符)
可选注释
返回logLines

它编译得很好,但没有解析。我已经尝试了一些小修改,但最好的结果是解析一切,直到第一个评论,所以我开始认为这不是做到这一点的方式。
PS:
我见过这个 >如果我正确理解了你对评论中格式的描述,你的格式示例将是

  name arg1 arg2 c1 weight 
- 在此处注释

可选择跟随其他日志行和/或注释。

然后你的问题是在日志行和注释行之间有一个换行符,这意味着注释部分分隔符解析器失败 - 注释必须以 - 开头 - code> newline 被尝试并成功。然后下一行以 - 开头,这使得 plainValue 没有消耗输入而失败,因此结束 sepBy1



解决方法是让分隔符首先使用换行符,然后使用以下注释行:

  final = do 
skipMany注释
sepEndBy1 logLine(换行符>> skipMany注释)

通过允许序列以分隔符结束( sepEndBy1 而不是 sepBy1 ),最后的 LogLine 之后的任何注释行都会自动跳过。


I am trying to write a Haksell Parsec Parser that parses input data from a file into the LogLine datatype as follows:

--Final parser that holds the indvidual parsers.
final :: Parser [LogLine]
final = do{ logLines <- sepBy1 logLine eol
        ; return logLines
        }


--The logline token declaration
logLine :: Parser LogLine
logLine = do
name <-  plainValue -- parse the name (identifier)
many1 space -- parse and throw away a space
args1 <- bracketedValue -- parse the first arguments
many1 space -- throw away the second sapce
args2 <- bracketedValue -- parse the second list of arguments
many1 space -- 
constant <- plainValue -- parse the constant identifier
space
weighting <- plainValue --parse the weighting double
space
return $ LogLine name args1 args2 constant weighting

It parses everything just fine, but now I need to add comments to the file, and I have to modify the parser so that it ignores them. It should support single-line comments only beginning with "--" and ending with a '\n' I've tried defining the comment token as follows:

comments :: Parser String
comments = do 
    string "--"
    comment <- (manyTill anyChar newline)
    return ""

And then plugging it into the final parser like so:

final :: Parser [LogLine]
final = do 
        optional comments
        logLines <- sepBy1 logLine (comments<|>newline)
        optional comments
        return logLines

It compiles fine, but it does not parse. I've tried several minor modifications but the best result was parsing everything up to the first comment, so I'm beginning to think that this is not the way to do it. PS: I've seen this Similar Question, but it is slightly different from what I'm trying to achieve.

解决方案

If I understand your description of the format in your comment correctly, your example for the format would be

name arg1 arg2 c1 weight
-- comment goes here

optionally followed by further log-lines and/or comments.

Then your problem is that there is a newline between the log-line and the comment line, which means that the comments part of the separator parser fails - comments must start with "--" - without consuming input, so newline is tried and succeeds. Then the next line begins with "--" which makes plainValue fail without consuming input, and thus ends the sepBy1.

The solution is to let the separator first consume a newline, and then as many comment lines as follow:

final = do
    skipMany comments
    sepEndBy1 logLine (newline >> skipMany comments)

by allowing the sequence to be ended by a separator (sepEndBy1 instead of sepBy1), any comment lines after the final LogLine are automatically skipped.

这篇关于使用Parsec解析数据并省略注释的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆