使用 nom 5.0 解析数字 [英] Parsing number with nom 5.0
问题描述
我正在尝试使用 Nom 5.0 解析一个大文件(数十 GB)流.解析器的一部分尝试解析数字:
I'm trying to parse a large file (tens of GB) streaming using Nom 5.0. One piece of the parser tries to parse numbers:
use nom::IResult;
use nom::character::streaming::{char, digit1};
// use nom::character::complete::{char, digit1};
use nom::combinator::{map, opt};
use nom::multi::many1;
use nom::sequence::{preceded, tuple};
pub fn number(input: &str) -> IResult<&str, &str> {
map(
tuple((
opt(char('-')),
many1(digit1),
opt(preceded(char('.'), many1(digit1)))
)),
|_| "0"
)(input)
}
(显然,它不应该为所有数字返回0";这只是为了使函数尽可能简单.)对于这个解析器,我写了一个测试:
(Obviously, it should not return "0" for all number; that's just to make the function as simple as possible.) For this parser, I wrote a test:
#[test]
fn match_positive_integer() {
let (_, res) = number("0").unwrap();
assert_eq!("0", res);
}
此测试因 Incomplete(Size(1))
而失败,因为小数"opt()
想要读取数据,但它不存在.如果我切换到匹配器的 complete
版本(如注释掉的行),则测试通过.
This test fails with Incomplete(Size(1))
because the "decimals" opt()
wants to read data and it isn't there. If I switch to the complete
versions of the matchers (as commented-out line), the test passes.
我认为这实际上会在生产中起作用,因为在抱怨不完整时会提供额外的数据,但我仍然想创建单元测试.此外,如果一个数字恰好是文件中输入的最后一位,那么生产中就会出现该问题.我如何说服流式 Nom 解析器没有更多可用数据?
I assume this will actually work in production, because it will be fed additional data when complaining about incompleteness, but I would still like to create unit tests. Additionally, the issue would occur in production if a number happened to be the very last bit of input in a file. How do I convince a streaming Nom parser that there is no more data available?
推荐答案
人们可以争辩说原始形式的测试是正确的:解析器无法确定给定的输入是否为数字,因此解析-结果其实还没有定论.在生产中,尤其是在像您一样读取大文件时,已读取但要解析的字节的缓冲区可能正好位于可能 之间的数字之间,除非它实际上不是.然后,解析器需要保留其当前状态并请求更多输入,以便它可以重试/继续.将 Incomplete
视为最终错误,而不是将其视为 我什至不知道:这可能是一个错误,具体取决于下一个字节,这个问题目前尚无法确定!
.
One can argue that the test in its original form is correct: The parser can't decide whether the given input is a number or not, so the parsing-result is in fact undecided yet. In production, especially when reading large files as you do, the buffer of already-read-but-to-be-parsed bytes might end right in between what could be a number unless it's actually not. Then, the parser needs to preserve its current state and ask for more input so it can retry/continue. Think of Incomplete
not as a final error but as I don't even know: This could be an error depending on the next byte, this problem is undecidable as of yet!
.
您可以使用完整
-combinator 在你的顶级解析器,所以当你确实达到了 EOF
时,你会出错.Incomplete
-results within 顶级解析器应该被处理,例如通过将读取缓冲区扩展一些余量并重试.
You can use the complete
-combinator on your top-level parser so when you do in fact reach EOF
, you error out on that. Incomplete
-results within the top-level parser should be handled e.g. by expanding the read-buffer by some margin and retrying.
您可以将解析器包装在当前单元测试本地的 complete()
解析器中并对其进行测试.有什么要注意的
You can wrap the parser in a complete()
-parser local to the current unittest and test on that. Something to the tune of
#[test]
fn match_positive_integer() {
let (_, res) = complete(number("0")).unwrap();
assert_eq!("0", res);
}
这篇关于使用 nom 5.0 解析数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!