解析C ++的复杂性 [英] complexity of parsing C++

查看:158
本文介绍了解析C ++的复杂性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

出于好奇,我想知道关于解析C ++的一些理论结果。

Out of curiosity, I was wondering what were some "theoretical" results about parsing C++.

让n是我的项目的大小,但由于我们处理big-O并不是很重要)

Let n be the size of my project (in LOC, for example, but since we'll deal with big-O it's not very important)


  • C ++是否在O(n)中解析?

  • 在O(n)中解析C(或Java或其语法意义上的任何更简单的语言)?

  • Will C ++ 1x是否会引入新的功能,更难以解析?

>

推荐答案

我认为解析这个词被不同的人以不同的方式解释为问题的目的。

I think the term "parsing" is being interpreted by different people in different ways for the purposes of the question.

在一个狭窄的技术意义上,解析只是验证源代码匹配语法(或者甚至构建树)。

In a narrow technical sense, parsing is merely verifying the the source code matches the grammar (or perhaps even building a tree).

有一个相当普遍的民俗定理,说你不能解析C ++(在这个意义上),因为你必须解析某些符号的含义来解析。

它源于使用weak(LALR或回溯递归下降)解析器,如果他们提交错误选择一个局部不明确的文本部分的几个可能的subparse(这 SO线程讨论一个例子),由于有时做出选择,完全失败。使用这样的解析器解决困境的方式是在解析发生时收集符号表数据,并且对解析过程进行额外的检查,以迫使解析器在遇到这样的选择时做出正确的选择。这样做的代价是显着纠缠不清的名称和类型解析与解析,这使得构建这样的解析器真的很难。但是,至少对于传统的GCC,他们使用LALR,这是解析的线性时间,我不认为那么昂贵,如果你包括解析器的名称/类型捕获(解析后还有更多的事情,因为我'

It arises from the use of "weak" (LALR or backtracking recursive descent) parsers, which, if they commit to the wrong choice of several possible subparse of a locally ambiguous part of text (this SO thread discusses an example), fail completely by virtue of sometimes making that choice. The way those that use such parser resolve the dilemma is collect symbol table data as parsing occurs and mash extra checking into the parsing process to force the parser to make the right choice when such choice is encountered. This works at the cost of significantly tangling name and type resolution with parsing, which makes building such parsers really hard. But, at least for legacy GCC, they used LALR which is linear time on parsing and I don't think that much more expensive if you include the name/type capture that the parser does (there's more to do after parsing because I don't think they do it all).

至少有两个C ++解析器的实现使用pure GLR解析技术,它简单地承认解析可能是局部模糊的,并捕获多个解析而没有评论或重大开销。 GLR解析器是线性时间,其中没有局部模糊度。它们在歧义区域中更昂贵,但是实际上,标准C ++程序中的大多数源文本落入线性时间部分。因此,有效速率是线性的,甚至捕获模糊。两个实现的解析器在解析后使用名称和类型解析,并使用不一致来消除不正确的歧义解析。
(这两种实现是 Elsa 我们的(SD的)C ++前端。我不能说Elsa的当前能力(我不认为它已经更新了几年),但我们做的所有C ++ 11 包括GCC和微软的变体)。

There are at least two implementations of C++ parsers done using "pure" GLR parsing technology, which simply admits that the parse may be locally ambiguous and captures the multiple parses without comment or significant overhead. GLR parsers are linear time where there are no local ambiguities. They are more expensive in the ambiguity region, but as a practical matter, most the of source text in a standard C++ program falls into the "linear time" part. So the effective rate is linear, even capturing the ambiguities. Both of the implemented parsers do name and type resolution after parsing and use inconsistencies to eliminate the incorrect ambiguous parses. (The two implementations are Elsa and our (SD's) C++ Front End. I can't speak for Elsa's current capability (I don't think it has been updated in years), but ours does all of C++11 including GCC and Microsoft variants).

采取硬计算机科学定义,一种语言被延伸定义为一个任意的字符串(语法应该是简洁的方式编码的密集),并将解析为检查程序的语法是正确的然后对于C ++你展开模板以验证每个实际完全展开。有一个图灵机隐藏在模板中,因此在理论上检查一个C ++程序是否有效是不可能的(没有时间限制)。真正的编译器(遵守标准)对它们将会做多少模板展开的地方设置固定的约束,因此真实的记忆,所以在实践中C ++编译器完成。但是他们可以花费任意长的时间来解析这个意义上的程序。我认为这是大多数人关心的答案。

If you take the hard computer science definition that a language is extensionally defined as an arbitrary set of strings (Grammars are supposed to be succinct ways to encode that intensionally) and treating parsing as "check the the syntax of the program is correct" then for C++ you have expand the templates to verify that each actually expands completely. There's a Turing machine hiding in the templates, so in theory checking that a C++ program is valid is impossible (no time limits). Real compilers (honoring the standard) place fixed constraints on how much template unfolding they'll do, and so does real memory, so in practice C++ compilers finish. But they can take arbitrarily long to "parse" a program in this sense. And I think that's the answer most people care about.

实际上,我猜大多数模板实际上很简单,所以C ++编译器可以完成其他编译器平均。只有疯狂的人在模板中写图灵机付出了严重的代价。 (意见:价格真的是把复杂的东西放在模板上的概念成本,而不是编译器的执行成本。)

As a practical matter, I'd guess most templates are actually pretty simple, so C++ compilers can finish as fast as other compilers on average. Only people crazy enough to write Turing machines in templates pay a serious price. (Opinion: the price is really the conceptual cost of shoehorning complicated things onto templates, not the compiler execution cost.)

这篇关于解析C ++的复杂性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆