Ruby 1.9正则表达式对上下文无关语法是否同样强大? [英] Are Ruby 1.9 regular expressions equally powerful to a context free grammar?

查看:96
本文介绍了Ruby 1.9正则表达式对上下文无关语法是否同样强大?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个正则表达式:

regex = %r{\A(?<foo> a\g<foo>a | b\g<foo>b | c)\Z}x

当我针对多个字符串进行测试时,它似乎与上下文无关的语法一样强大,因为它可以正确处理递归。

When I test it against several strings, it appears to be as powerful as a context free grammar because it handles the recursion properly.

regex.match("aaacaaa")
# => #<MatchData "aaacaaa" foo:"aaacaaa">
regex.match("aacaa")
# => #<MatchData "aacaa" foo:"aacaa">
regex.match("aabcbaa")
# => #<MatchData "aabcbaa" foo:"aabcbaa">
regex.match("aaacaa")
# => nil

使用Ruby 1.9正则表达式的乐趣举了一个例子,他实际上安排了正则表达式的所有部分,使其看起来像上下文-语法如下:

"Fun with Ruby 1.9 Regular Expressions" has an example where he actually arranges all the parts of a regex so that it looks like a context-free grammar as follows:

sentence = %r{ 
    (?<subject>   cat   | dog   | gerbil    ){0} 
    (?<verb>      eats  | drinks| generates ){0} 
    (?<object>    water | bones | PDFs      ){0} 
    (?<adjective> big   | small | smelly    ){0} 

    (?<opt_adj>   (\g<adjective>\s)?     ){0} 

    The\s\g<opt_adj>\g<subject>\s\g<verb>\s\g<opt_adj>\g<object> 
}x

在他重新排列正则表达式各部分的技巧之间,以及我的示例递归命名捕获组,这是否意味着Ruby 1.9正则表达式具有等效于上下文无关语法的功能?

Between his technique for rearranging the parts of the regex, and my example of recursive named capturing groups, does this mean Ruby 1.9 regular expressions have the power equivalent to a context-free grammar?

推荐答案

关于Ruby 1.9中使用的Oniguruma regexp引擎的令人敬畏的事情之一–它具有解析器的功能,并且不仅限于识别常规语言。它具有正面和反面的前瞻性/反面性,甚至可以用来识别某些不是上下文无关的语言!以以下示例为例:

This is one of the awesome things about the Oniguruma regexp engine used in Ruby 1.9 – it has the power of a parser, and is not restricted to recognizing regular languages. It has positive and negative lookahead/lookbehind, which even can be used to recognize some languages which are not context-free! Take the following as an example:

regexp = /\A(?<AB>a\g<AB>b|){0}(?=\g<AB>c)a*(?<BC>b\g<BC>c|){1}\Z/

此正则表达式可识别诸如 abc, aabbcc, aaabbbccc之类的字符串– a, b和 c的数目必须相等,否则将不匹配。

This regexp recognizes strings like "abc", "aabbcc", "aaabbbccc", and so on – the number of "a", "b", and "c" must be equal, or it will not match.

(一个限制:您不能使用命名组

(One limitation: you can’t use named groups in the lookahead and lookbehind.)

虽然我还没有偷窥,但Oniguruma似乎通过简单的递归来处理已命名的组,并在某些情况下进行备份。匹配。我观察到它不能处理左递归。例如:

Although I haven’t peeked under the hood, Oniguruma seems to deal with named groups by simple recursive descent, backing up when something doesn’t match. I’ve observed that it can’t deal with left recursion. For example:

irb(main):013:0> regexp = /(?<A>\g<A>a|)/
SyntaxError: (irb):13: never ending recursion: /(?<A>\g<A>a|)/
    from C:/Ruby192/bin/irb:12:in `<main>'

我不太清楚我的解析理论,但是我认为像这样的非确定性自上而下的解析器应该能够解析任何上下文无关的语言。 (语言,而不是语法;如果语法已左递归,则必须将其转换为右递归。)如果不正确,请编辑此帖子。

I don’t remember my parsing theory very clearly, but I think that a non-deterministic top-down parser like this should be able to parse any context-free language. ("language", not "grammar"; if your grammar has left recursion, you will have to convert it to right recursion.) If that is incorrect, please edit this post.

这篇关于Ruby 1.9正则表达式对上下文无关语法是否同样强大?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆