F#如何标记用户输入:分隔数字,单位和单词? [英] F# How to tokenise user input: separating numbers, units, words?
问题描述
我对F#还是陌生的,但最近几周一直在阅读参考资料.我希望处理用户提供的输入字符串,以识别和分隔组成元素.例如,对于此输入:
I am fairly new to F#, but have spent the last few weeks reading reference materials. I wish to process a user-supplied input string, identifying and separating the constituent elements. For example, for this input:
XYZ酒店:6晚于220EUR/晚 加上17.5%的税金
XYZ Hotel: 6 nights at 220EUR / night plus 17.5% tax
输出应类似于元组列表:
the output should resemble something like a list of tuples:
[("XYZ",Word); (酒店:",字词);
("6",数字); (夜晚",Word);
("at",运算符); ("220",数字);
("EUR",CurrencyCode); ("/", 操作员); (夜晚",Word);
(加号",运算符); ("17.5", 数字); (%, 百分); (税", 字)]
[ ("XYZ", Word); ("Hotel:", Word);
("6", Number); ("nights", Word);
("at", Operator); ("220", Number);
("EUR", CurrencyCode); ("/", Operator); ("night", Word);
("plus", Operator); ("17.5", Number); ("%", PerCent); ("tax", Word) ]
由于我正在处理用户输入,因此可以是任何东西.因此,期望用户遵守语法是不可能的.我想识别数字(可以是整数,浮点数,负数...),度量单位(可选,但在我的示例中可以包括SI或英制物理单位,货币代码,诸如"night/s"之类的计数) ,数学运算符(作为数学符号或包括按",按",按",折价"等词)以及所有其他词.
Since I'm dealing with user input, it could be anything. Thus, expecting users to comply with a grammar is out of the question. I want to identify the numbers (could be integers, floats, negative...), the units of measure (optional, but could include SI or Imperial physical units, currency codes, counts such as "night/s" in my example), mathematical operators (as math symbols or as words including "at" "per", "of", "discount", etc), and all other words.
我的印象是我应该使用主动模式匹配-正确吗? -但我不确定如何开始.任何指向适当参考资料或类似示例的指针都将是很好的.
I have the impression that I should use active pattern matching -- is that correct? -- but I'm not exactly sure how to start. Any pointers to appropriate reference material or similar examples would be great.
推荐答案
我使用 FParsec 库.该示例一点也不鲁棒,但是它很好地展示了如何使用FParsec.
I put together an example using the FParsec library. The example is not robust at all but it gives a pretty good picture of how to use FParsec.
type Element =
| Word of string
| Number of string
| Operator of string
| CurrencyCode of string
| PerCent of string
let parsePerCent state =
(parse {
let! r = pstring "%"
return PerCent r
}) state
let currencyCodes = [|
pstring "EUR"
|]
let parseCurrencyCode state =
(parse {
let! r = choice currencyCodes
return CurrencyCode r
}) state
let operators = [|
pstring "at"
pstring "/"
|]
let parseOperator state =
(parse {
let! r = choice operators
return Operator r
}) state
let parseNumber state =
(parse {
let! e1 = many1Chars digit
let! r = opt (pchar '.')
let! e2 = manyChars digit
return Number (e1 + (if r.IsSome then "." else "") + e2)
}) state
let parseWord state =
(parse {
let! r = many1Chars (letter <|> pchar ':')
return Word r
}) state
let elements = [|
parseOperator
parseCurrencyCode
parseWord
parseNumber
parsePerCent
|]
let parseElement state =
(parse {
do! spaces
let! r = choice elements
do! spaces
return r
}) state
let parseElements state =
manyTill parseElement eof state
let parse (input:string) =
let result = run parseElements input
match result with
| Success (v, _, _) -> v
| Failure (m, _, _) -> failwith m
这篇关于F#如何标记用户输入:分隔数字,单位和单词?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!