正则表达式用于构建解析器吗? [英] Are regular expressions used to build parsers?

查看:93
本文介绍了正则表达式用于构建解析器吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这只是出于好奇而提出的一个问题,因为最近我一直需要越来越多地解析和使用regex.说,当被问到与正则表达式有关的问题时,正则表达式不适合这样做,而应使用这样的解析器"……随着我逐渐了解正则表达式,我认为大多数事情都是可能的,只是它的复杂性和时间性这很麻烦,因为您必须考虑许多不同的可能性,当然,它必须与条件语句和循环结合以构建任何类型的解析器..所以我想知道regex是用于构建大多数解析器还是还有其他方法在使用..我只是在想,因为我可能需要构建一些相当复杂的自定义解析器,而这些解析器不一定要使用现有的解析器.

This is just a question out of curiosity since I have been needing to get more and more into parsing and using regex lately.. it seems, for questions I come across in my searches regarding parsing of some sort, someone always ends up saying, when asked something relating to regex, "regex isn't good for that, use such and such parser instead"... as I have come to better understand regex, I think most stuff is possible, just its rather complex and time consuming since you have to account for many different possiblities, and of course, it has to be combined with conditional statements and loops to build any sort of parser.. so I'm wondering if regex is what is used to build most parsers or is there some other method being used.. I am just wondering since I may have the need to build some fairly complex custom parsers coming up where there isn't necessarily an existing one to use.

感谢您提供任何信息,因为我似乎找不到直接的答案.

thanks for any info as I can't seem to find a direct answer to this.

推荐答案

通常,您将使用两种(至少)类型的工具来构建解析器.

Typically, you'll use two (at least) types of tools in building your parser.

第一部分是词法分析-将字符分成标记并过滤掉注释和空格.该部分通常使用正则表达式完成.嗯,这通常是使用扫描器生成器完成的,该生成器将成对的正则表达式和代码对转换为一个程序,当它识别出正则表达式时就执行相应的代码.事实证明,这比每次测试每个正则表达式更有效,并且由于其他各种原因,它也更好地工作. FLEX是C语言中常用的工具.

The first part is lexical analysis -- separating characters into tokens and filtering out comments and whitespace. That part is typically done with regular expressions. Well, it's even more typically done using a scanner generator that converts a collection of pairs of regular expressions and code into a program that executes the corresponding code when it recognizes the regular expressions. This turns out to be more efficient than testing each regular expression each time, and it also works better for various other reasons. FLEX is a common tool for this in C.

解析器的第二部分是语法.最典型的工具是另一个程序生成器,它接受上下文无关的语法(CFG),该语法带有注释,用于解释词性"部分. CFG能够表达诸如平衡括号之类的东西,而正则表达式则不能(除非已通过CF功能对其进行了扩展,因此从数学意义上讲并非严格意义上的正则").但是带有规则的CFG非常好,因为您可以将语义含义附加到语言的短语结构中. BISON是C语言中此部分的常用工具.

The second part of your parser is the grammar. The most typical tool for this is another program-generator that accepts a context-free grammar (CFG) annotated with rules for interpreting the component "parts of speech", as it were. A CFG is able to express things like balanced parenthesis, which a regular expression cannot (unless it's been extended with CF features, making it not strictly "regular" in the mathematical sense). But a CFG with rules is very nice because you can attach a semantic meaning to the phrase structure of your language. BISON is a common tool for this part in C.

但是我实际上告诉了你一点谎言.您会看到,每种真正的编程语言都具有无法在无上下文框架中表达的部分.例如,您需要使用它来连接变量的定义,以使您知道要生成的指令以及对它的操作是否合法.通常认为这超出了解析的范围,但是有一些诸如属性语法"之类的东西,就像CFG所扩展的功能一样,这些功能甚至可以使这些依赖于上下文的代码更易于编写和使用.

But I actually told you a little lie. You see, every real programming language has parts that cannot be expressed within a context-free framework. For example, you need to connect the definition of a variable with the use of it so that you know what instructions to generate, and also if an operation on it is legal. That's typically considered outside the scope of parsing, but there are such things as "attribute grammars" which are like CFGs extended with features that can make even these context-dependencies much easier to code up and work with.

现在,没有规则表明您必须使用此类工具.许多简单的语法很容易用手写工具进行处理.例如,可以将LISP的S表达式简单地扫描为:

Now, there's no rule that says you HAVE to use such tools. Many simple grammars are easy enough to process with hand-written tools. For example, LISP's S-expressions can be simply scanned as:

如果以数字开头,请读取一个数字. 如果以字母开头,请阅读符号. 如果是空格,请跳过它. 如果它是开放式的,则跳过它,递归该例程以获取值,并期望封闭式的输入.

If it starts with a digit, read a number. If it starts with a letter, read a symbol. If it's a space, skip it. If it's an open-paren, then skip it, recurse this routine for a value, and expect a close paren.

好吧,字符串和其他功能还有更多复杂性,但这是基本思想.解析FORTH甚至更简单,因为它不构建递归数据结构.

Well, there are a few more complications for strings and what-have-you, but that's the basic idea. Parsing FORTH is even simpler, because it doesn't build a recursive data structure.

无论如何,这将使您继续进行任何项目.

Anyway, that should get you going on whatever your project is.

这篇关于正则表达式用于构建解析器吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆