具有重复字符的正则表达式 [英] Regular Expressions with repeated characters

查看:117
本文介绍了具有重复字符的正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要编写一个正则表达式,该表达式可以检测仅包含字符x,y和z的字符串,但其中的字符与其相邻字符不同。

I need to write a regular expression that can detect a string that contains only the characters x,y, and z, but where the characters are different from their neighbors.

这里是一个示例

xyzxzyz = Pass

xyzxzyz = Pass

xyxyxyx =通过

xyxyxyx = Pass

xxyzxz =失败(重复x)

xxyzxz = Fail (repeated x)

zzzxxzz =失败(重复相邻字符)

zzzxxzz = Fail (adjacent characters are repeated)

我认为这会起作用((x | y | z)?)*,但似乎不起作用。有什么建议吗?

I thought that this would work ((x|y|z)?)*, but it does not seem to work. Any suggestions?

编辑

请注意,我在寻找答案不允许向前看或向后看。唯一允许的操作是交替,串联,分组和闭包

Please note, I am looking for an answer that does not allow for look ahead or look behind operations. The only operations allowed are alternation, concatenation, grouping, and closure

推荐答案

通常,对于这种类型的问题,如果正则表达式不是简单到可以直接派生,您可以从绘制DFA开始并从那里派生正则表达式。

Usually for this type of question, if the regex is not simple enough to be derived directly, you can start from drawing a DFA and derive a regex from there.

您应该能够派生以下DFA。 q1,q2,q3,q4是结束状态,q1也是开始状态。 q5是失败/陷阱状态。

You should be able to derive the following DFA. q1, q2, q3, q4 are end states, with q1 also being the start state. q5 is the failed/trap state.

有几种方法可以查找DFA的正则表达式。我将使用 Brzozowski代数方法,如本文

There are several methods to find Regular Expression for a DFA. I am going to use Brzozowski Algebraic Method as explained in section 5 of this paper:

对于每个状态qi,等式Ri是项的并集:对于从qi到qj的 a 过渡,项为aRj。基本上,您将查看状态中的所有传出边。如果Ri是最终状态,则λ也是项之一。

For each state qi, the equation Ri is a union of terms: for a transition a from qi to qj, the term is aRj. Basically, you will look at all the outgoing edges from a state. If Ri is a final state, λ is also one of the terms.

让我引用本文定义部分的身份,因为稍后它们会派上用场(λ是空字符串,∅是空集):

Let me quote the identities from the definition section of the paper, since they will come in handy later (λ is the empty string and ∅ is the empty set):

(ab)c = a(bc) = abc
λx = xλ = x
∅x = x∅ = ∅
∅ + x = x
λ + x* = x*
(λ + x)* = x*

由于q5是一个陷阱状态,该公式将以无限递归结束,因此您可以将其放在等式中。如果最终将其包含在方程中,它将最终成为空集并消失(在附录中有解释)。

Since q5 is a trap state, the formula will end up an infinite recursion, so you can drop it in the equations. It will end up as empty set and disappear if you include it in the equation anyway (explained in the appendix).

您将得出:

R1 = xR2 + yR3 + zR4 + λ
R2 =     + yR3 + zR4 + λ
R3 = xR2 +     + zR4 + λ
R4 = xR2 + yR3       + λ

用代入和Arden求解上述方程定理,其中指出:

Solve the equation above with substitution and Arden's theorem, which states:


给出形式为 X = AX + B 在λ∉A处,等式具有解 X = A * B

Given an equation of the form X = AX + B where λ ∉ A, the equation has the solution X = A*B.

您将得到答案。

我没有时间和信心来推导出整件事,但我将展示推导的前几个步骤。

I don't have time and confidence to derive the whole thing, but I will show the first few steps of derivation.

通过替换删除R4,请注意zλ由于以下身份而变为z:

Remove R4 by substitution, note that zλ becomes z due to the identity:

R1 = xR2 + yR3 + (zxR2 + zyR3 + z) + λ
R2 =     + yR3 + (zxR2 + zyR3 + z) + λ
R3 = xR2 +     + (zxR2 + zyR3 + z) + λ

重新组合它们:

R1 = (x + zx)R2 + (y + zy)R3 + z + λ
R2 =       zxR2 + (y + zy)R3 + z + λ
R3 = (x + zx)R2 +       zyR3 + z + λ

将Arden定理应用于R3:

Apply Arden's theorem to R3:

R3 = (zy)*((x + zx)R2 + z + λ)
   = (zy)*(x + zx)R2 + (zy)*z + (zy)*

您可以将R3替换为R2和R1并删除R3。我把其余的留给运动。

You can substitute R3 back to R2 and R1 and remove R3. I leave the rest as exercise. Continue ahead and you should reach the answer.

我们将解释为什么可以从中丢弃陷阱状态等式,因为它们无论如何都会消失。让我们以DFA中的状态q5为例。

We will explain why trap states can be discarded from the equations, since they will just disappear anyway. Let us use the state q5 in the DFA as an example here.

R5 = (x + y + z)R5

使用身份∅+ x = x

R5 = (x + y + z)R5 + ∅

将Arden定理应用于R5:

Apply Arden's theorem to R5:

R5 = (x + y + z)*∅

使用身份∅x=x∅=∅

R5 = ∅

当R5代入其他方程式时,∅x=x∅=∅的身份也将生效,从而导致R5消失。

The identity ∅x = x∅ = ∅ will also take effect when R5 is substituted into other equations, causing the term with R5 to disappear.

这篇关于具有重复字符的正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆