如何基准Boost Spirit解析器？ [英] How to benchmark Boost Spirit Parser?

查看：177 发布时间：2016/10/13 11:23:59 c++ benchmarking boost-spirit boost-spirit-qi boost-spirit-lex

本文介绍了如何基准Boost Spirit解析器？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在编译器，我想提高其性能。我发现大约50％的时间花在解析源文件。由于源文件相当小，我之后做了很多转换，在我看来，它是完美的。

我的解析器是一个Boost Spirit解析器和一个词法分析器（与lexer :: pos_iterator），我有一个中等大小的语法。我把源解析成AST。

我的问题是，我不知道在解析过程中花费的时间最多：AST节点，词法分析器，解析器规则或内存的副本。

我不认为它是I / O问题，因为我在SSD上工作，我在开始读完文件，然后使用只有内存版本。

我尝试使用profiler，但是需要时间的方法是Boost的一些方法，名字长达几百个字符，我不知道他们做什么...

那么，是否有一种首选方法来对Boost Spirit解析器及其语法进行基准测试？
或者有一些规则可以用来验证某些特定点的效率？

感谢

感兴趣的人的来源：

错误

语法语法语法

Lexer

我的分析器很快告诉我，构建语法和（特别是）lexer对象需要相当一些资源。

实际上，只需在SpiritParser.cpp中更改单个行节省了40％的执行时间¹ 到〜17s）：

  lexer :: Lexer lexer;

加入

  static const lexer :: Lexer lexer;

现在，

使语法静态涉及使其无状态。我是通过
- 将 position_begin 移动到 qi :: _ a （使用 qi :: locals ）和
- 继承属性
  - EDDIGrammar 和 ValueGrammar 语法，例如
```
  start％= qi :: eps [qi :: _ a = qi :: r1]>>程序; 
  
```
  - 以及 ValueGrammar 正在外部使用）。
这有一些次优的副作用： p>
- 规则调试被注释掉，因为 lexer :: pos_iterator_type 没有默认输出流过载
- qi :: position（position_begin）表达式已被伪造 >
```
  auto local_pos = qi :: lazy（
 boost :: phoenix :: construct< qi :: position> _a）
）; 
  
```
  这似乎不太理想。（理想情况下，想要通过修改的自定义解析器指令来替换 qi :: position ，知道如何从qi :: locals获取begin_position

无论如何，实施这些进一步的更改² / a> 刮掉了另一个约15％的执行时间：

  static const lexer :: Lexer勒克斯
 static const parser :: EddiGrammar grammar（lexer）; 
 
 try {
 bool r = spirit :: lex :: tokenize_and_parse（
 position_begin，position_end，
 lexer，
 grammar（boost :: phoenix： ：cref（position_begin）），
程序）;

松散的想法：

您是否考虑过生成静态词法分析器（生成静态分析器）

您是否考虑过使用期望点来减少回溯量（注意：我没有 Position :: file 和位置：：theLine ？复制字符串似乎比必要的更重。我更喜欢存储 const char * 。您也可以查看 Boost Flyweight

是否需要在 qi :: position 指令中预先跳过？

（有些不严重：您考虑移植到 Spirit X3 ？它似乎以移动语义的形式承诺潜在的好处。）

希望这有助。

^[1]当解析 test / cases / *中的所有测试用例时eddi 100次 like （github） ：

  for（int i = 0; i <100; i ++）
 for（auto& fname：argv）
 {
 eddic :: ast ::源文件程序; 
 std :: cout<< fname<< ：<< std :: boolalpha< parser.parse（fname，program，nullptr）<< \\\
; 
}

定时使用简单

  time ./test ../../test/cases/*.eddi | md5sum

使用 md5sum

^[2]我在这里创建了一个带有概念验证重构的拉取请求https://github.com/wichtounet/eddic/pull/52

I'm working on a compiler and I would like to improve its performances. I found that about 50% of the time is spent parsing the source files. As the source file are quite small and I do quite a lot of transformations after that, it seems to me that it is perfectible.

My parser is a Boost Spirit parser with a lexer (with lexer::pos_iterator) and I have a middle sized grammar. I'm parsing the source into an AST.

My problem is that I have no idea what takes the most time during the parsing: copies of AST nodes, lexer, parser rules or memory.

I don't think that it is I/O problem since I'm working on a SSD and that I'm reading the file entirely at the beginning and then using only the memory version.

I tried using profilers, but the methods that takes time are some methods from Boost with names hundreds of characters long and I don't know exactly what they do...

So, is there a preferred way to benchmark a Boost Spirit Parser and its grammar ? Or is there some rules that can be used to verify the efficiency at some specific points ?

Thanks

Sources for those interested:

解决方案

I have given things a quick scan.

My profiler quickly told me that constructing the grammar and (especially) the lexer object took quite some resources.

Indeed, just changing a single line in SpiritParser.cpp saved 40% of execution time¹ (~28s down to ~17s):

    lexer::Lexer lexer;

into

    static const lexer::Lexer lexer;

Now,

making the grammar static involves making it stateless. I made this happen by
- moving position_begin into qi::_a (using qi::locals) and
- passing it in as an inherited attribute at the appropriate times
  - the EDDIGrammar and ValueGrammar grammars, e.g.
```
start %= qi::eps [ qi::_a = qi::_r1 ] >> program;
```
  - as well as the individual rules from ValueGrammar that are being used externally).
This had a number of suboptimal side effects:
- rule debugging is commented out because the lexer::pos_iterator_type has no default output streaming overload
- the qi::position(position_begin) expression has been 'faked' with a rather elaborate replacement:
```
auto local_pos = qi::lazy (
        boost::phoenix::construct<qi::position>(qi::_a)
    );
```
  That doesn't seem ideal. (Ideally, one would like to replace qi::position by a modified custom parser directive that knows how to get get begin_position from qi::locals (?) so there would be no need to invoke a parser expression lazily?)

Anyways, implementing these further changes² shaved off another ~15% of execution time:

static const lexer::Lexer lexer;
static const parser::EddiGrammar grammar(lexer); 

try {
    bool r = spirit::lex::tokenize_and_parse(
            position_begin, position_end,
            lexer, 
            grammar(boost::phoenix::cref(position_begin)),
            program);

Loose ideas:

Have you considered generating a static lexer (Generating the Static Analyzer)
Have you considered using expectation points to potentially reduce the amount of backtracking (note: I didn't measure anything in that area)
Have you considered alternatives for Position::file and Position::theLine? Copying the strings seems heavier than necessary. I'd prefer to store const char *. You could also look at Boost Flyweight
Is the pre-skip really required inside your qi::position directive?
(Somewhat non-serious: have you considered porting to Spirit X3? It seems to promise potential benefits in the form of move semantics.)

Hope this helps.

^[1] When parsing all test cases in test/cases/*.eddi 100 times like so (github):

for (int i=0; i<100; i++)
    for (auto& fname : argv)
{
    eddic::ast::SourceFile program;
    std::cout << fname << ": " << std::boolalpha << parser.parse(fname, program, nullptr) << "\n";
}

Timed with a simple

time ./test ../../test/cases/*.eddi | md5sum

With the md5sum acting as a sanity check.

^[2] I created a pull request with the proof-of-concept refactorings here https://github.com/wichtounet/eddic/pull/52

这篇关于如何基准Boost Spirit解析器？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何基准Boost Spirit解析器？ [英] How to benchmark Boost Spirit Parser?

问题描述

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

如何基准Boost Spirit解析器？ [英] How to benchmark Boost Spirit Parser?

问题描述

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭