算法创建的正则表达式嵌套模式的第n个级别 [英] Algorithm to create n-th level of nested patterns in RegEx

查看:97
本文介绍了算法创建的正则表达式嵌套模式的第n个级别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

正如在解释可以定期前pressions用于匹配嵌套模式?时,它不可能创建正则表达式来匹配任意嵌套模式。但是,它可以创建一个算法,将产生nesteness的第n个级别?

As explained in Can regular expressions be used to match nested patterns?, it is not possible to create regex to match arbitrary nested pattern. But is it possible to create an algorithm that would generate a regex of n-th level of "nesteness"?

基本上,我想替换修剪(无论) RTRIM(LTRIM(其他))

我设法创建3个级别的手(JavaScript语法):

i managed to create 3 levels by hand (javascript syntax):

level[1] = /\(([^()]*)\)/g
level[2] = /\(((?:[^()]*\([^()]*\))*[^()]*)\)/g
level[3] = /\(((?:(?:(?:[^()]*\([^()]*\))*[^()]*)*\((?:(?:[^()]*\([^()]*\))*[^()]*)*\))*[^()]*)\)/g

下面是一些测试数据:

1st(ddd) + 1st(ddd)
2nd(dd(d))
3rd(a(b) + (cd(h) + d(dfas) + zzz))
4th(a(b(c(d))))
8th(a(b(c(d(e(f(g()))))))

我知道,在每一级 [^()] * 需要被替换,可以包含括号的非捕获组,但我不知道如何<强>概括algoritm为n个级别

i know that at every level [^()]* needs to be replaced with noncapturing group that can contain parentheses, but i'm not sure how to generalize the algoritm for n-th level...

推荐答案

您可以考虑一下更理论上:一个匹配括号嵌套 N 深的就是圆括号左右比赛为 N-1 或更少深(至少有一个完全 N-1 深)。

You can think about it more theoretically: a match for parenthesis nested n deep is just parenthesis around matches for n-1 or less deep (with at least one exactly n-1 deep).

我们可以给正则表达式的递归定义。让 X [N] 是正则表达式嵌套完全 N 的水平,和 Y [N ] 是正则表达式包含括号嵌套最多 N 级别的任何级别的字符串,因此:

We can give a recursive definition of the regexes. Let X[n] be the regex for nesting exactly n levels, and Y[n] be the regex for a string containing brackets with any level of nesting up to n levels, so:

X[n] = \( (Y[n-2] X[n-1])+ Y[n-2] \)

Y[n] = [^()]* ( \( Y[n-1] \) [^()]* )*

Y [0] = X [0] = [^()] * (无嵌套)和 X [1] = \ ([^()] * \)。 (我不是非捕获组等细节困扰着呢,空间只是为了便于阅读。)

with Y[0] = X[0] = [^()]* (no nesting) and X[1] = \([^()]*\). (I'm not bothering with the details of non-capturing groups etc yet, and the spaces are just for readability.)

写在此基础上的算法应该是很容易的。

Writing an algorithm based on this should be quite easy.

从这些新的(减少相互递归)定义的正则表达式变长了很多的要慢得多(他们是多项式,而不是指数)。

The regexes from these new (less mutually recursive) definitions get longer much much more slowly (they are polynomial rather than exponential).

L [N] 是长度 X [N] L [N] 是长度 Y [N] ,然后(常数项只是硬codeD中的每一个字符):

Let l[n] be the length of X[n], and L[n] be the length of Y[n], then (the constant terms are just the hardcoded characters in each one):

L[n] = 19 + L[n-1] = 19*n + L[0] = 19*n + 6

l[n] = 3 + L[n-2] + l[n-1] + 2 + L[n-2] + 2
     = 7 + 2 * L[n-2] + l[n-1]
     = -57 + 38 * n + l[n-1]

相应的初始条件L [0] L [1] 。这种形式的递推关系有二次的解决方案,因此,这是唯一的为O(n ^ 2)。好多了。

with the appropriate initial conditions for l[0] and l[1]. Recurrence relations of this form have quadratic solutions, so this is only O(n^2). Much better.

(对于其他人,我有 Y [N] Y [N] = Y [N-1 previous定义] | X [N] ;这额外的递归意味着的 X 的长度正则表达式是 O(2.41 ^ N ),它吸收了不少。)

(For others, I had a previous definition of Y[n] was Y[n] = Y[n-1] | X[n]; this extra recursion meant that the length of the X regex was O(2.41^n), which sucks a lot.)

的新定义是是一个诱人的暗示,甚至有可能会写的一种方式 X 的是线性的 N 。我不知道,虽然,我有一种感觉上的 X 的确切额外的限制长度是指它是不可能的。)

(The new definition of Y is a tantalising hint that there might even be a way of writing X that is linear in n. I don't know though, and I have a feeling the extra restriction on X of exact length means it is impossible.)

下面是一些Python code,计算上述正则表达式,你也许可以将其翻译成JavaScript没有太多的麻烦。

The following is some Python code that computes the regexes above, you can probably translate it to javascript without too much trouble.

# abbreviation for the No Parenthesis regex
np = "[^()]*"

# compute Y[n] from Y[n-1]
def next_y(y_n1):
    return np + "(?:\(" + y_n1 + "\)" + np + ")*"

# compute X[n] from X[n-1] and Y[n-2]
def next_x(x_n1, y_n2):
    return "\((?:" + y_n2 + x_n1 + ")+" + y_n2 + "\)"

# compute [X[n], Y[n], Y[n-1]]
# (to allow us to make just one recursive call at each step)
def XY(n):
    if n == 0:
        return [np, # X[0]
                np, # Y[0]
                ""] # unused
    elif n == 1:
        return ["\([^()]*\)", # X[1]
                next_y(np),   # Y[1]
                np]           # Y[0]

    x_n1, y_n1, y_n2 = XY(n-1) # X[n-1], Y[n-1], Y[n-2]

    return [next_x(x_n1, y_n2), # X[n]
            next_y(y_n1),       # Y[n]
            y_n1]               # Y[n-1]

# wrapper around XY to compute just X[n]
def X(n):
    return XY(n)[0]

# wrapper around XY to compute just Y[n]
def Y(n):
    return XY(n)[1]

这篇关于算法创建的正则表达式嵌套模式的第n个级别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆