FParsec:如何组合解析器,以便它们可以按任意顺序进行匹配 [英] FParsec: how to combine parsers so that they will be matched in arbitrary order
问题描述
任务是找到特定的键值对并对其进行解析.该对可以以任何顺序出现.我的部分工作尝试:
The task is find particular key-value pairs and parse them. The pairs can occur in any order. My partially working attempt:
open FParsec
type Parser<'a> = Parser<'a, unit>
type Status = Running | Done
type Job =
{ Id: int
Status: Status
Count: int }
let ws = spaces
let jobId: Parser<int> = ws >>. skipStringCI "Job id" >>. ws >>. skipChar '=' >>. ws >>. pint32
let status: Parser<Status> =
ws >>. skipStringCI "Status" >>. ws >>. skipChar '=' >>. ws >>. (
(skipStringCI "Running" >>% Running) <|> (skipStringCI "Done" >>% Done))
let count: Parser<int> = ws >>. skipStringCI "Count" >>. ws >>. skipChar '=' >>. ws >>. pint32
let parse: Parser<Job> = parse {
do! skipCharsTillStringCI "Job id" false 1000
let! id = jobId
do! skipCharsTillStringCI "Status" false 1000
let! status = status
do! skipCharsTillStringCI "Count" false 1000
let! count = count
return { Id = id; Status = status; Count = count }}
[<EntryPoint>]
let main argv =
let sample = """
Some irrelevant text.
Job id = 33
Some other text.
Status = Done
And another text.
Count = 10
Trailing text.
"""
printfn "%A" (run parse sample)
0
(*
result:
Success: {Id = 33;
Status = Done;
Count = 10;}
*)
因此,它可以工作,但是有两个问题:明显的重复(jobId函数中的"Job id"和顶级解析器中的"Job id"等),并且期望"Job id","Status"和计数"按此特定顺序排序,这是错误的要求.
So, it works but it has two problems: obvious duplication ("Job id" in jobId function and "Job id" in the top-level parser and so on), and it expects "Job id", "Status" and "Count" to be sequenced in this particular order, which is wrong by the requirement.
我有很强的感觉,对此有一个优雅的解决方案.
I have a strong feeling that there's an elegant solution for this.
谢谢!
推荐答案
第一个问题(重复)可以通过较小的重构来解决.基本思想是将每个解析器包装到一个可以跳过的包装器中.
请注意,该代码还远远不够完善,我只是试图使重构尽可能小.
The first problem (duplication) can be solved with a minor refactoring. The basic idea is wrapping each parser into a wrapper that would do skipping.
Note that this code is yet far from perfection, I just tried to make refactoring as small as possible.
let jobId: Parser<int> = pint32
let status: Parser<Status> =
(skipStringCI "Running" >>% Running) <|> (skipStringCI "Done" >>% Done)
let count: Parser<int> = pint32
let skipAndParse prefix parser =
skipCharsTillStringCI prefix false 1000
>>. ws >>. skipStringCI prefix >>. ws >>. skipChar '=' >>. ws >>. parser
let parse: Parser<Job> = parse {
let! id = skipAndParse "Job id" jobId
let! status = skipAndParse "Status" status
let! count = skipAndParse "Count" count
return { Id = id; Status = status; Count = count }}
第二个问题更加复杂.如果希望数据行以自由顺序显示,则必须考虑以下情况:
The second problem is more complicated. If you want the data lines to appear in a free order, you must consider the case when
- 不是全部数据行;
- 某个数据行出现两次或更多;
- not all data lines present;
- a certain data line appears twice or more;
为减轻这种情况,您需要生成找到的数据线list
,分析是否存在所需的所有内容,并决定如何处理任何可能的重复项.
To mitigate this, you need to produce a list
of data lines found, analyze if everything required is there, and decide what to do with any possible duplicates.
请注意,每条数据行都不能再包含跳过"部分,因为它可能会在实际解析器之前跳过信息丰富的行.
Note that each data line can not afford to have "skip" part anymore, since it may skip an informative line before the actual parser.
let skipAndParse2 prefix parser =
ws >>. skipStringCI prefix >>. ws >>. skipChar '=' >>. ws >>. parser
// Here, you create a DU that will say which data line was found
type Result =
| Id of int
| Status of Status
| Count of int
| Irrelevant of string
// here's a combinator parser
let parse2 =
// list of possible data line parsers
// Note they are intentionally reordered
[
skipAndParse2 "Count" count |>> Count
skipAndParse2 "Status" status |>> Status
skipAndParse2 "Job id" jobId |>> Id
// the trailing one would skip a line in case if it has not
// been parsed by any of prior parsers
// a guard rule is needed because of specifics of
// restOfLine behavior at the end of input: namely, it would
// succeed without consuming an input, which leads
// to an infinite loop. Actually FParsec handles this and
// raises an exception
restOfLine true .>> notFollowedByEof |>> Irrelevant
]
|> List.map attempt // each parser is optional
|> choice // on each iteration, one of the parsers must succeed
|> many // a loop
运行代码:
let sample = "
Some irrelevant text.\n\
Job id = 33\n\
Some other text.\n\
Status = Done\n\
And another text.\n\
Count = 10\n\
Trailing text.\n\
"
sample |> run parse2 |> printfn "%A "
将产生以下输出:
Success: [Irrelevant ""; Irrelevant "Some irrelevant text."; Id 33;
Irrelevant ""; Irrelevant "Some other text."; Status Done; Irrelevant "";
Irrelevant "And another text."; Count 10; Irrelevant ""]
进一步处理需要过滤Irrelevant
元素,检查重复项或丢失项,形成Job
记录或引发错误.
Further processing requires filtering Irrelevant
elements, checking for duplicates or missing items, and forming the Job
record, or raising errors.
更新:一个进一步处理以隐藏Result
并返回Job option
的简单示例:
UPDATE: a simple example of further processing to hide out Result
and returning Job option
instead:
// naive implementation of the record maker
// return Job option
// ignores duplicate fields (uses the first one)
// returns None if any field is missing
let MakeJob arguments =
let a' =
arguments
|> List.filter (function |Irrelevant _ -> false | _ -> true)
try
let theId = a' |> List.pick (function |Id x -> Some x | _ -> None)
let theStatus = a' |> List.pick (function |Status x -> Some x | _ -> None)
let theCount = a' |> List.pick (function |Count x -> Some x | _ -> None)
Some { Id=theId; Status = theStatus; Count = theCount }
with
| :?System.Collections.Generic.KeyNotFoundException -> None
要使用它,只需将以下行添加到parse2
的代码中:
To use it, simply add the following line to the code of parse2
:
|>> MakeJob
这篇关于FParsec:如何组合解析器,以便它们可以按任意顺序进行匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!