用于在 C++ 中解析许多小文本的最佳解析器生成器? [英] Best parser generator for parsing many small texts in C++?

查看:32
本文介绍了用于在 C++ 中解析许多小文本的最佳解析器生成器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

出于性能原因,我将 C# 库移植到 C++.在正常运行期间,该库需要解析大约 150'000 个平均长度小于 150 个字符的数学表达式(想想 excel 公式).

I am, for performance reason, porting a C# library to C++. During normal operation, this library needs, amongst other things, to parse about 150'000 math expressions (think excel formulas) with an average length of less than 150 characters.

在C#版本中,我使用GOLD解析器生成解析代码.它可以在一秒内解析所有 150'000 个表达式.

In the C# version, I used GOLD parser to generate parsing code. It can parse all 150'000 expressions in under one second.

因为我们正在考虑扩展我们的语言,所以我认为转向 C++ 可能是改用 ANTLR 的好机会.我已将(简单)语法移植到 ANTLR 并从中生成 C 代码.解析 150'000 个表达式需要超过 12 秒,因为对于每个表达式,我需要创建一个新的 ANTL3_INPUT_STREAM、令牌流、词法分析器和解析器 - 至少在 3.4 版本中,无法重用它们.

Because we were thinking about extending our language, I figured the move to C++ might be a good chance to change to ANTLR. I have ported the (simple) grammar to ANTLR and generated C code out of it. Parsing the 150'000 expressions takes over 12 seconds, because for each expression, I need to create a new ANTL3_INPUT_STREAM, token stream, lexer and parser - there is, at least in version 3.4, no way to reuse them.

如果有人能给我推荐使用什么来代替,我将不胜感激 - GOLD 当然是一个选项,尽管生成 C++ 或 C 代码似乎比 C# 代码复杂得多.我的语法与 LALR 和 LL(1) 兼容.最重要的问题是解析小输入的性能.

I'd be grateful is someone could give me a recommendation what to use instead - GOLD is of course an option though generating C++ or C code seems a lot more complicated than the C# variety. My grammar is LALR and LL(1) compatible. Paramount concern is parsing performance on small inputs.

推荐答案

我会尝试 boost::spirit.它通常非常快(即使解析像整数这样的简单事物,它也比 C 函数 atoi http://alexott.blogspot.com/2010/01/boostspirit2-vs-atoi.html)

I would try boost::spirit. It is often extreamly fast (even for parsing simple things like an integer it can be faster than the C function atoi http://alexott.blogspot.com/2010/01/boostspirit2-vs-atoi.html)

http://boost-spirit.com/home/

它有很好的东西:只有标题,所以依赖地狱,自由许可.

It has nice things : header only, so dependency hell, liberal licence.

但是请注意学习曲线是困难的.它是现代 C++(没有指针,但有很多模板和非常令人沮丧的编译错误),所以来自 C 或 C#,你可能不太舒服.

However be warned that the learning curve is difficult. It's modern C++ (no pointer, but a lot of template and very frustrating compiling errors), so coming from C or C#, you might not be very comfortable.

这篇关于用于在 C++ 中解析许多小文本的最佳解析器生成器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆