解析与Boost.Spirit一个符号化的自由形式文法 [英] Parsing a tokenized free form grammar with Boost.Spirit

查看:388
本文介绍了解析与Boost.Spirit一个符号化的自由形式文法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我卡住了试图以创建callgrind工具的输出Boost.Spirit解析器是的valgrind的一部分。 Callgrind输出特定领域的嵌入式编程语言(DSEL),它可以让你做各种如自定义前pressions合成柜台很酷的东西,但它不是简单的解析。

I've got stuck trying to create a Boost.Spirit parser for the callgrind tool's output which is part of valgrind. Callgrind outputs a domain specific embedded programming language (DSEL) which lets you do all sorts of cool stuff like custom expressions for synthetic counters, but it's not easy to parse.

我已经放了一些样品callgrind输出在 https://开头要点.github.com / ned14 / 5452719#文件的采样callgrind输出。我已经放置在Boost.Spirit词法和语法分析器我目前最好的尝试在的https: //gist.github.com/ned14/5452719#file-callgrindparser-hpp 并的 https://gist.github.com/ned14/5452719#file-callgrindparser-cxx 。词法分析器部分很简单:它tokenises标记值,非空白文本,注释,线,整数,十六进制数,花车和运营商端(忽略样本code中的标点符号,他们未使用)。白色空间被跳过。

I've placed some sample callgrind output at https://gist.github.com/ned14/5452719#file-sample-callgrind-output. I've placed my current best attempt at a Boost.Spirit lexer and parser at https://gist.github.com/ned14/5452719#file-callgrindparser-hpp and https://gist.github.com/ned14/5452719#file-callgrindparser-cxx. The Lexer part is straightforward: it tokenises tag-values, non-whitespace text, comments, end of lines, integers, hexadecimals, floats and operators (ignore the punctuators in the sample code, they're unused). White space is skipped.

到目前为止好。问题是解析的标记化输入流。我还没有尝试的主要节呢,我还在试图解析它可能发生在任何文件中的点标签值。变量值是这样的:

So far so good. The problem is parsing the tokenised input stream. I haven't even attempted the main stanzas yet, I'm still trying to parse the tag-values which can occur at any point in the file. Tag values look like this:

tagtext: unknown series of tokens<eol>

这可能是自由的文本例如

It could be freeform text e.g.

desc: I1 cache: 32768 B, 64 B, 8-way associative, 157 picosec hit latency

在这种情况下,你要设置的令牌转换为​​字符串即到iterator_range的和提取。

In this situation you'd want to convert the set of tokens to a string i.e. to an iterator_range and extract.

这可能不过是一个前pression例如

It could however be an expression e.g.

event: EPpsec = 316 Ir + 1120 I1mr + 1120 D1mr + 1120 D1mw + 1362 ILmr + 1362 DLmr + 1362 DLmw

这表示,从现在开始,事件EPpsec被合成为铱乘以316加入到I1mr乘以1120加入...等。

This says that from now on, event EPpsec is to be synthesised as Ir multiplied by 316 added to I1mr multiplied by 1120 added to ... etc.

在这里我想指出的一点是,标签值对需要积累的任意套令牌,和后处理成无论我们把它们变成以后。

The point I want to make here is that tag-value pairs need to be accumulated as arbitrary sets of tokens, and post-processed into whatever we turn them into later.

为此,Boost.Spirit的utree()类看上去正是我想要的,那是什么样code使用。但在VS2012使用CTP月编译器可变参数模板我目前看到此编译错误:

To that end, Boost.Spirit's utree() class looked exactly what I wanted, and that's what the sample code uses. But on VS2012 using the November CTP compiler with variadic templates I'm currently seeing this compile error:

1>C:\Users\ndouglas.RIMNET\documents\visual studio 2012\Projects\CallgrindParser\boost\boost/range/iterator_range_core.hpp(56): error C2440: 'static_cast' : cannot convert from 'boost::spirit::detail::list::node_iterator<const boost::spirit::utree>' to 'base_iterator_type'
1>          No constructor could take the source type, or constructor overload resolution was ambiguous
1>          C:\Users\ndouglas.RIMNET\documents\visual studio 2012\Projects\CallgrindParser\boost\boost/range/iterator_range_core.hpp(186) : see reference to function template instantiation 'IteratorT boost::iterator_range_detail::iterator_range_impl<IteratorT>::adl_begin<const Range>(ForwardRange &)' being compiled
1>          with
1>          [
1>              IteratorT=base_iterator_type
1>  ,            Range=boost::spirit::utree
1>  ,            ForwardRange=boost::spirit::utree
1>          ]

...这表明我的base_iterator_type,这是一个Boost.Spirit multi_pass&下;>用于正向迭代性质的istreambuf_iterator的包裹,以某种方式不被Boost.Spirit的utree理解()实现。事情是,我不知道这是否是我不好code或坏Boost.Spirit code看到,因为line_pos_iterator&LT;>已无法正确指定其forward_iterator概念标签

... which suggests that my base_iterator_type, which is a Boost.Spirit multi_pass<> wrap of an istreambuf_iterator for forward iterator nature, is somehow not understood by Boost.Spirit's utree() implementation. Thing is, I'm not sure if this is my bad code or bad Boost.Spirit code seeing as line_pos_iterator<> was failing to correctly specify its forward_iterator concept tag.

由于过去帮忙#1我可以写一个纯粹的非标记化的语法,但它会很脆。正确的解决办法是tokenise和使用能够公平地任意输入的自由语法。乘车路线Boost.Spirit的Lex和语法在现实世界的例子合作的实例的个数来实现这个,而不是玩具的例子是可悲的是极少数。因此,任何帮助将不胜AP preciated。

Thanks to past Stackoverflow help I could write a pure non-tokenised grammar, but it would be brittle. The right solution is to tokenise and use a freeform grammar capable of fairly arbitrary input. The number of examples of getting Boost.Spirit's Lex and Grammar working together in real world examples to achieve this rather than toy examples is sadly very few. Therefore any help would be greatly appreciated.

尼尔·

推荐答案

令牌属性暴露了一个变种,它除了基本迭代器区间,可以_assume在 token_type 的typedef:

The token attribute exposes a variant, which in addition to the base-iterator range, can _assume the types declared in the token_type typedef:

typedef lex::lexertl::token<base_iterator_type, mpl::vector<std::string, int, double>> token_type;

所以:字符串 INT 双击。但是请注意,强制的进入可能的类型之一只会懒洋洋地发生,当解析器的真正的使用值。

So: string, int and double. Note however that coercion into one of the possible types will only occur lazily, when the parser actually uses the value.

utree 是一个非常通用的容器 [1] 。因此,当你暴露在一个规则精神:: utree 属性,令牌的值变的包含iterator_range的,那么就尝试分配到 utree 对象(此操作失败,因为迭代器是...'时髦')。

utrees are a very versatile container [1]. Hence, when you expose a spirit::utree attribute on a rule, and the token value variant contains an iterator_range, then it attempts to assign that into the utree object (this fails, because the iterators are ... 'funky').

让你期望的行为,最简单的方法是为 齐间preT的标签的属性标记为一个字符串,并已的的分配给 utree 。因此,以下行构成了修复,这将使编译成功:

The easiest way to get your desired behaviour is to force Qi to interpret the attribute of the tag token as a string, and have that assigned to the utree. Therefore the following line constitutes a fix that will make compilation succeed:

    unknowntagvalue = qi::as_string[tok.tag] >> restofline;

注释

说了这么多,我的确会建议以下

Notes

Having said all this, I would indeed suggest the following


  • 考虑使用 Nabialek计谋派遣不同的懒规则的视标记匹配 - 这使得它不需要处理生 utree 晚节上

  • Consider using the Nabialek Trick to dispatch different lazy rules depending on the tag matched - this makes it unnecessary to deal with raw utrees later on

您可能有成功的专业的boost ::精神特质:: :: assign_to_XXXXXX 特征(见<一href=\"http://www.boost.org/doc/libs/1_53_0/libs/spirit/doc/html/spirit/advanced/customize/assign_to.html\"相对=nofollow>文档)

You might have had success specializing boost::spirit::traits::assign_to_XXXXXX traits (see documentation)

考虑使用一个纯粹的齐解析器。虽然我可以感觉你的情绪,这是要脆 [2] 看来你已经表明,它提高了复杂到这种程度,它可能不会有净优点:

consider using a pure Qi parser. While I can "feel" your sentiment that "it is going to brittle" [2] it seems you have already demonstrated that it raises the complexity to such a degree that it might not have net merit:


  • 意想不到的方法,使属性物质化(这个问题)

  • 与行POS迭代器的问题(这是常见的问题,并且AFAIR有大多的硬盘不雅的解决方案)

  • 对于例如缺乏灵活性特设调试(访问在SA源数据),开关/禁用船长等。

  • 我个人的经验是,在看的词法分析器状态的驱动这些是没有帮助的,因为切换词法分析器状态只能从词法分析器令牌语义动作,而往往,消除歧义会发生在齐相

  • the unexpected ways in which attributes materialize (this question)
  • the problem with line-pos iterators (this is frequently asked question, and AFAIR it has mostly hard or inelegant solutions)
  • the inflexibility regarding e.g. ad-hoc debugging (access to source data in SA), switching/disabling skippers etc.
  • my personal experience was that looking at lexer states to drive these isn't helpful, because switching lexer state can only work reliably from lexer token semantic actions, whereas often, the disambiguation would happen in the Qi phase

但我发散:)

[1] 例如他们有设施非常轻量级的迭代器范围的'引用'(例如符号,或避免的复制从源缓冲区中的字符的进入,除非想要的属性)

[1] e.g. they have facilities for very lightweight 'referencing' of iterator ranges (e.g. for symbols, or to avoid copying characters from a source buffer into the attribute unless wanted)

[2] 实际上,仅仅是因为使用顺序词法分析器(扫描仪)大大降低了回溯机会的数量,所以它简化了解析器的心智模式。但是,您可以使用期望点来大致相同的效果。

[2] In effect, only because using a sequential lexer (scanner) vastly reduces the number of backtrack opportunities, so it simplifies the mental model of the parser. However, you can use expectation points to much the same effect.

这篇关于解析与Boost.Spirit一个符号化的自由形式文法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆