F#如何标记用户输入:分隔数字,单位和单词? [英] F# How to tokenise user input: separating numbers, units, words?

查看:69
本文介绍了F#如何标记用户输入:分隔数字,单位和单词?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对F#还是陌生的,但最近几周一直在阅读参考资料.我希望处理用户提供的输入字符串,以识别和分隔组成元素.例如,对于此输入:

I am fairly new to F#, but have spent the last few weeks reading reference materials. I wish to process a user-supplied input string, identifying and separating the constituent elements. For example, for this input:

XYZ酒店:6晚于220EUR/晚 加上17.5%的税金

XYZ Hotel: 6 nights at 220EUR / night plus 17.5% tax

输出应类似于元组列表:

the output should resemble something like a list of tuples:

[("XYZ",Word); (酒店:",字词);
("6",数字); (夜晚",Word);
("at",运算符); ("220",数字);
("EUR",CurrencyCode); ("/", 操作员); (夜晚",Word);
(加号",运算符); ("17.5", 数字); (%, 百分); (税", 字)]

[ ("XYZ", Word); ("Hotel:", Word);
("6", Number); ("nights", Word);
("at", Operator); ("220", Number);
("EUR", CurrencyCode); ("/", Operator); ("night", Word);
("plus", Operator); ("17.5", Number); ("%", PerCent); ("tax", Word) ]

由于我正在处理用户输入,因此可以是任何东西.因此,期望用户遵守语法是不可能的.我想识别数字(可以是整数,浮点数,负数...),度量单位(可选,但在我的示例中可以包括SI或英制物理单位,货币代码,诸如"night/s"之类的计数) ,数学运算符(作为数学符号或包括按",按",按",折价"等词)以及所有其他词.

Since I'm dealing with user input, it could be anything. Thus, expecting users to comply with a grammar is out of the question. I want to identify the numbers (could be integers, floats, negative...), the units of measure (optional, but could include SI or Imperial physical units, currency codes, counts such as "night/s" in my example), mathematical operators (as math symbols or as words including "at" "per", "of", "discount", etc), and all other words.

我的印象是我应该使用主动模式匹配-正确吗? -但我不确定如何开始.任何指向适当参考资料或类似示例的指针都将是很好的.

I have the impression that I should use active pattern matching -- is that correct? -- but I'm not exactly sure how to start. Any pointers to appropriate reference material or similar examples would be great.

推荐答案

我使用 FParsec 库.该示例一点也不鲁棒,但是它很好地展示了如何使用FParsec.

I put together an example using the FParsec library. The example is not robust at all but it gives a pretty good picture of how to use FParsec.

type Element =
| Word of string
| Number of string
| Operator of string
| CurrencyCode of string
| PerCent  of string    

let parsePerCent state =
    (parse {
        let! r = pstring "%"
        return PerCent r
    }) state

let currencyCodes = [|
    pstring "EUR"
|]

let parseCurrencyCode state =
    (parse {
        let! r = choice currencyCodes
        return CurrencyCode r
    }) state

let operators = [|
    pstring "at"
    pstring "/"
|]

let parseOperator state =
    (parse {
        let! r = choice operators
        return Operator r
    }) state

let parseNumber state =
    (parse {
        let! e1 = many1Chars digit
        let! r = opt (pchar '.')
        let! e2 = manyChars digit
        return Number (e1 + (if r.IsSome then "." else "") + e2)
    }) state

let parseWord state =
    (parse {
        let! r = many1Chars (letter <|> pchar ':')
        return Word r
    }) state

let elements = [| 
    parseOperator
    parseCurrencyCode
    parseWord
    parseNumber 
    parsePerCent
|]

let parseElement state =
    (parse {
        do! spaces
        let! r = choice elements
        do! spaces
        return r
    }) state

let parseElements state =
    manyTill parseElement eof state

let parse (input:string) =
    let result = run parseElements input 
    match result with
    | Success (v, _, _) -> v
    | Failure (m, _, _) -> failwith m

这篇关于F#如何标记用户输入:分隔数字,单位和单词?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆