算法从任何正则表达式生成上下文无关文法 [英] Algorithm to generate context free grammar from any regex

查看：528 发布时间：2015/11/30 14:57:53 regex algorithm nlp context-free-grammar computation-theory

本文介绍了算法从任何正则表达式生成上下文无关文法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

任何人都可以概括我的算法，可以将任何给定的正则表达式为一套对应的CFG规则？

Can anyone outline for me an algorithm that can convert any given regex into an equivalent set of CFG rules?

我知道如何解决这个基本的东西，如（A | B）*：

I know how to tackle the elementary stuff such as (a|b)*:

S -> a A
S -> a B
S -> b A
S -> b B
A -> a A
A -> a B
A -> epsilon
B -> b A
B -> b B
B -> epsilon
S -> epsilon (end of string)

不过，我有一些问题，正式到一个合适的算法尤其是更复杂的前pressions，可以有许多嵌套操作。

However, I'm having some problem formalizing it into a proper algorithm especially with more complex expressions that can have many nested operations.

推荐答案

如果你只是在谈论从理论角度常规EX pressions，有这三个结构：

If you are just talking about regular expressions from a theoretical point of view, there are these three constructs:

ab       # concatenation
a|b      # alternation
a*       # repetition or Kleene closure

那么你可能只是做了什么：

What you could then just do:

创建一个规则的S - ＆GT; （fullRegex）
在每一个重复词（X）* 在 fullRegex 创建一个规则 X - ＆GT;的x×和 X - ＆GT; ε，然后替换（X）* 与 X 。
在每一个交替（A | B | C）创建规则Ÿ - ＆GT;一个，Ÿ - ＆GT; b 和Ÿ - ＆GT; ç，然后替换（A | B | C）与是

create a rule S -> (fullRegex)
for every repeated term (x)* in fullRegex create a rule X -> x X and X -> ε, then replace (x)* with X.
for every alternation (a|b|c) create rules Y -> a, Y -> b and Y -> c, then replace (a|b|c) with Y

简单地重复这个递归（注意，所有 X， A ， B 和 C 仍然可以是复杂的常规前pressions）。请注意，当然，你必须使用唯一的标识符，每一步。

Simply repeat this recursively (note that all x, a, b and c can still be complex regular expressions). Note that of course you have to use unique identifiers for every step.

这应该是足够的。这当然不会给予最优雅或高效语法，但是这是归一化的用途（和它应该在一个单独的步骤来完成，并有<一个href="http://en.wikipedia.org/wiki/Chomsky_normal_form#Converting_a_grammar_to_Chomsky_Normal_Form">well-defined步骤做到这一点）。

This should be enough. This will certainly not give the most elegant or efficient grammar, but that is what normalization is for (and it should be done in a separate step and there are well-defined steps to do this).

一个例子： A（B | CD *（E | F）*）*

S -> a(b|cd*(e|f)*)*

S -> a X1; X1 -> (b|cd*(e|f)*) X1; X1 -> ε

S -> a X1; X1 -> Y1 X1; X1 -> ε; Y1 -> b; Y1 -> cd*(e|f)*

S -> a X1; X1 -> Y1 X1; X1 -> ε; Y1 -> b; Y1 -> c X2 (e|f)*; X2 -> d X2; X2 -> ε

... and a few more of those steps, until you end up with:

S  -> a X1
X1 -> Y1 X1
X1 -> ε
Y1 -> b
Y1 -> c X2 X3
X2 -> d X2
X2 -> ε
X3 -> Y2 X3
X3 -> ε
Y2 -> e
Y2 -> f

这篇关于算法从任何正则表达式生成上下文无关文法的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

算法从任何正则表达式生成上下文无关文法 [英] Algorithm to generate context free grammar from any regex

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录关闭

算法从任何正则表达式生成上下文无关文法 [英] Algorithm to generate context free grammar from any regex

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录 关闭

登录关闭