FParsec:如何组合解析器,以便它们可以按任意顺序进行匹配 [英] FParsec: how to combine parsers so that they will be matched in arbitrary order

查看:66
本文介绍了FParsec:如何组合解析器,以便它们可以按任意顺序进行匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

任务是找到特定的键值对并对其进行解析.该对可以以任何顺序出现.我的部分工作尝试:

The task is find particular key-value pairs and parse them. The pairs can occur in any order. My partially working attempt:

open FParsec

type Parser<'a> = Parser<'a, unit>
type Status = Running | Done

type Job = 
    { Id: int
      Status: Status
      Count: int }

let ws = spaces

let jobId: Parser<int> = ws >>. skipStringCI "Job id" >>. ws >>. skipChar '=' >>. ws >>. pint32

let status: Parser<Status> = 
    ws >>. skipStringCI "Status" >>. ws >>. skipChar '=' >>. ws >>. (
        (skipStringCI "Running" >>% Running) <|> (skipStringCI "Done" >>% Done))

let count: Parser<int> = ws >>. skipStringCI "Count" >>. ws >>. skipChar '=' >>. ws >>. pint32

let parse: Parser<Job> = parse {
    do! skipCharsTillStringCI "Job id" false 1000
    let! id = jobId
    do! skipCharsTillStringCI "Status" false 1000
    let! status = status
    do! skipCharsTillStringCI "Count" false 1000
    let! count = count
    return { Id = id; Status = status; Count = count }}

[<EntryPoint>]
let main argv = 
    let sample = """
Some irrelevant text.
Job id = 33
Some other text.
Status = Done
And another text.
Count = 10
Trailing text.
"""
    printfn "%A" (run parse sample)
    0
(* 
result:
 Success: {Id = 33;
 Status = Done;
 Count = 10;} 
*)

因此,它可以工作,但是有两个问题:明显的重复(jobId函数中的"Job id"和顶级解析器中的"Job id"等),并且期望"Job id","Status"和计数"按此特定顺序排序,这是错误的要求.

So, it works but it has two problems: obvious duplication ("Job id" in jobId function and "Job id" in the top-level parser and so on), and it expects "Job id", "Status" and "Count" to be sequenced in this particular order, which is wrong by the requirement.

我有很强的感觉,对此有一个优雅的解决方案.

I have a strong feeling that there's an elegant solution for this.

谢谢!

推荐答案

第一个问题(重复)可以通过较小的重构来解决.基本思想是将每个解析器包装到一个可以跳过的包装器中.
请注意,该代码还远远不够完善,我只是试图使重构尽可能小.

The first problem (duplication) can be solved with a minor refactoring. The basic idea is wrapping each parser into a wrapper that would do skipping.
Note that this code is yet far from perfection, I just tried to make refactoring as small as possible.

let jobId: Parser<int> = pint32

let status: Parser<Status> = 
    (skipStringCI "Running" >>% Running) <|> (skipStringCI "Done" >>% Done)

let count: Parser<int> = pint32

let skipAndParse prefix parser =
    skipCharsTillStringCI prefix false 1000
    >>. ws >>. skipStringCI prefix >>. ws >>. skipChar '=' >>. ws >>. parser

let parse: Parser<Job> = parse {
    let! id = skipAndParse "Job id" jobId
    let! status = skipAndParse "Status"  status
    let! count = skipAndParse "Count" count
    return { Id = id; Status = status; Count = count }}


第二个问题更加复杂.如果希望数据行以自由顺序显示,则必须考虑以下情况:


The second problem is more complicated. If you want the data lines to appear in a free order, you must consider the case when

  • 不是全部数据行;
  • 某个数据行出现两次或更多;
  • not all data lines present;
  • a certain data line appears twice or more;

为减轻这种情况,您需要生成找到的数据线list,分析是否存在所需的所有内容,并决定如何处理任何可能的重复项.

To mitigate this, you need to produce a list of data lines found, analyze if everything required is there, and decide what to do with any possible duplicates.

请注意,每条数据行都不能再包含跳过"部分,因为它可能会在实际解析器之前跳过信息丰富的行.

Note that each data line can not afford to have "skip" part anymore, since it may skip an informative line before the actual parser.

let skipAndParse2 prefix parser =
    ws >>. skipStringCI prefix >>. ws >>. skipChar '=' >>. ws >>. parser

// Here, you create a DU that will say which data line was found
type Result =
    | Id of int
    | Status of Status
    | Count of int
    | Irrelevant of string

// here's a combinator parser
let parse2 =
    // list of possible data line parsers
    // Note they are intentionally reordered
    [
    skipAndParse2 "Count" count |>> Count
    skipAndParse2 "Status"  status |>> Status
    skipAndParse2 "Job id" jobId |>> Id
    // the trailing one would skip a line in case if it has not
    // been parsed by any of prior parsers
    // a guard rule is needed because of specifics of
    // restOfLine behavior at the end of input: namely, it would
    // succeed without consuming an input, which leads
    // to an infinite loop. Actually FParsec handles this and
    // raises an exception
    restOfLine true .>> notFollowedByEof |>> Irrelevant
    ]
    |> List.map attempt // each parser is optional
    |> choice // on each iteration, one of the parsers must succeed
    |> many // a loop

运行代码:

let sample = "
Some irrelevant text.\n\
Job id = 33\n\
Some other text.\n\
Status = Done\n\
And another text.\n\
Count = 10\n\
Trailing text.\n\
"

sample |> run parse2 |> printfn "%A "

将产生以下输出:

Success: [Irrelevant ""; Irrelevant "Some irrelevant text."; Id 33;
Irrelevant ""; Irrelevant "Some other text."; Status Done; Irrelevant "";
Irrelevant "And another text."; Count 10; Irrelevant ""]

进一步处理需要过滤Irrelevant元素,检查重复项或丢失项,形成Job记录或引发错误.

Further processing requires filtering Irrelevant elements, checking for duplicates or missing items, and forming the Job record, or raising errors.

更新:一个进一步处理以隐藏Result并返回Job option的简单示例:

UPDATE: a simple example of further processing to hide out Result and returning Job option instead:

// naive implementation of the record maker
// return Job option
// ignores duplicate fields (uses the first one)
// returns None if any field is missing
let MakeJob arguments =
    let a' =
        arguments
        |> List.filter (function |Irrelevant _ -> false | _ -> true)

    try
        let theId     = a' |> List.pick (function |Id x -> Some x | _ -> None)
        let theStatus = a' |> List.pick (function |Status x -> Some x | _ -> None)
        let theCount  = a' |> List.pick (function |Count x -> Some x | _ -> None)
        Some { Id=theId; Status = theStatus; Count = theCount }
    with
        | :?System.Collections.Generic.KeyNotFoundException -> None

要使用它,只需将以下行添加到parse2的代码中:

To use it, simply add the following line to the code of parse2:

|>> MakeJob

这篇关于FParsec:如何组合解析器,以便它们可以按任意顺序进行匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆