算法从任何正则表达式生成上下文无关文法 [英] Algorithm to generate context free grammar from any regex
问题描述
任何人都可以概括我的算法,可以将任何给定的正则表达式为一套对应的CFG规则?
Can anyone outline for me an algorithm that can convert any given regex into an equivalent set of CFG rules?
我知道如何解决这个基本的东西,如(A | B)*:
I know how to tackle the elementary stuff such as (a|b)*:
S -> a A
S -> a B
S -> b A
S -> b B
A -> a A
A -> a B
A -> epsilon
B -> b A
B -> b B
B -> epsilon
S -> epsilon (end of string)
不过,我有一些问题,正式到一个合适的算法尤其是更复杂的前pressions,可以有许多嵌套操作。
However, I'm having some problem formalizing it into a proper algorithm especially with more complex expressions that can have many nested operations.
推荐答案
如果你只是在谈论从理论角度常规EX pressions,有这三个结构:
If you are just talking about regular expressions from a theoretical point of view, there are these three constructs:
ab # concatenation
a|b # alternation
a* # repetition or Kleene closure
那么你可能只是做了什么:
What you could then just do:
- 创建一个规则
的S - > (fullRegex)
- 在每一个重复词
(X)*
在fullRegex
创建一个规则X - >的x×
和X - > ε
,然后替换(X)*
与X
。 - 在每一个交替
(A | B | C)
创建规则Ÿ - >一个
,Ÿ - > b
和Ÿ - > ç
,然后替换(A | B | C)
与是
- create a rule
S -> (fullRegex)
- for every repeated term
(x)*
infullRegex
create a ruleX -> x X
andX -> ε
, then replace(x)*
withX
. - for every alternation
(a|b|c)
create rulesY -> a
,Y -> b
andY -> c
, then replace(a|b|c)
withY
简单地重复这个递归(注意,所有 X,
A
, B
和 C
仍然可以是复杂的常规前pressions)。请注意,当然,你必须使用唯一的标识符,每一步。
Simply repeat this recursively (note that all x,
a
, b
and c
can still be complex regular expressions). Note that of course you have to use unique identifiers for every step.
这应该是足够的。这当然不会给予最优雅或高效语法,但是这是归一化的用途(和它应该在一个单独的步骤来完成,并有<一个href="http://en.wikipedia.org/wiki/Chomsky_normal_form#Converting_a_grammar_to_Chomsky_Normal_Form">well-defined步骤做到这一点)。
This should be enough. This will certainly not give the most elegant or efficient grammar, but that is what normalization is for (and it should be done in a separate step and there are well-defined steps to do this).
一个例子: A(B | CD *(E | F)*)*
S -> a(b|cd*(e|f)*)*
S -> a X1; X1 -> (b|cd*(e|f)*) X1; X1 -> ε
S -> a X1; X1 -> Y1 X1; X1 -> ε; Y1 -> b; Y1 -> cd*(e|f)*
S -> a X1; X1 -> Y1 X1; X1 -> ε; Y1 -> b; Y1 -> c X2 (e|f)*; X2 -> d X2; X2 -> ε
... and a few more of those steps, until you end up with:
S -> a X1
X1 -> Y1 X1
X1 -> ε
Y1 -> b
Y1 -> c X2 X3
X2 -> d X2
X2 -> ε
X3 -> Y2 X3
X3 -> ε
Y2 -> e
Y2 -> f
这篇关于算法从任何正则表达式生成上下文无关文法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!