如何将有限模式扩展为所有可能的匹配项? [英] How can I expand a finite pattern into all its possible matches?

查看:42
本文介绍了如何将有限模式扩展为所有可能的匹配项?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

例如,给定模式

[a-zA-Z]_[0-9]{2}

一个函数将采用该模式并返回一个包含

a function would take the pattern and return an array or list containing

a_00, a_01, a_02, ... , a_98, a_99, ... , z_98, z_99

只需要扩展数字和字母(及其有限的分组).我该怎么做呢?首选 Python 或 Perl 示例.谢谢!

Only numbers and letters (and finite groupings thereof) need be expanded. How would I go about doing this? An example in Python or Perl would be preferred. Thank you!

推荐答案

首先,解析表达式并构建反映正则表达式句法结构的树,并包含逻辑上出现在末尾的终止节点.例如,在 lisp 符号中,您的示例可能如下所示:

First, parse the expression and build a tree reflecting the syntactic structure of the regular expression, and include a terminator node that logically appears at the end. For example, in lisp notation, your example might look like this:

(concat
  (repeat 2
    (concat
      (charset
        (range a z)
        (range A Z))
      (literal _)
      (charset
        (range 0 9))))
  terminator)

接下来,对树进行线程化,以便您可以使用递归来生成组合扩展.我的意思是,例如(concat a .. z) 中的节点 a..z 都需要有从一个到下一个的指针,所以 a 指向b 等等,并且 concat 节点本身指向它的后继节点.思路是可以在展开中产生当前元素的一个版本,并递归到下一个元素,当下一个元素返回时可以尝试当前元素的下一个版本,以此类推,直到所有版本都用完为止你回到你的呼叫者(前任或父母).这种线程最容易通过堆栈和树的后序遍历来完成.如果在后序遍历过程中小心地压入节点,栈顶将是序列中的下一个元素.

Next, thread the tree so that you can use recursion to generate the combinatorial expansion. By that, I mean e.g. that nodes a..z in (concat a .. z) all need to have pointers from one to the next, so a points to b, and so on, and the concat node itself points to its successor. The idea is that you can produce one version of the current element in the expansion, and recurse into the next element, and when the next element returns you can try the next version of the current element, etc., until all versions are exhausted and you return to your caller (the predecessor or parent). This threading is easiest done with a stack and a post-order traversal of the tree. If you carefully push nodes during post-order traversal, the top of the stack will be the next element in sequence.

(线程的替代方法是构造树,使每个 concat 节点中的下一个元素是前一个节点的子节点,并且 repeat 节点的子节点指向回到重复节点.)

(An alternative to threading is to structure the tree so that the next element in every concat node is a child of the previous node, and repeat nodes' children point back to the repeat node.)

然后编写一个例程(或使用模式匹配的一组例程,或者如果树中的节点是使用面向对象语言中的多态实现的,则为任何给定的节点类型生成正确的输出并递归到虚拟方法中)以适当的方式下一个节点或子节点.例如,在伪代码中:

Then write a routine (or set of routines using pattern matching, or virtual methods if nodes in the tree are implemented using polymorphism in an object-oriented language) that, for any given node type, produces the correct output and recurses into the next node or children in the appropriate way. For example, in pseudocode:

if node is of form (repeat n): # non-variable repeat
    for i in 1 to n
        recurse into child
    recurse into threaded successor

if node is of form (concat ...):
    recurse into first element # last element will recurse into successor

if node is of form (literal x):
    emit x
    recurse into successor
    remove x

if node is of form (charset ...):
    for each element in charset:
        emit element
        recurse into successor
        remove element

if node is terminator:
    add string created thus far to final output list

等等.正如您所观察到的,重复节点的子节点不能递归到 repeat 节点的后继节点,因此在对树进行线程化时需要考虑到这一点.同样需要注意的是,当到达 repeat 节点的子节点末尾时,当前进度"不会丢失;或者,子节点的后继节点可以指向重复节点本身(即节点图上的真正闭包),但这需要在某处存储一个计数器.

etc. As you can observe, children of a repeat node mustn't recurse into the successor of the repeat node, so that needs to be taken into account when threading the tree. Similar care needs to be taken that "current progress" isn't lost when reaching the end of child nodes of a repeat node; alternatively, the successor of child nodes could point to the repeat node itself (i.e. a true closure over the graph of nodes), but that will require storing a counter somewhere.

总而言之,完成此操作可能需要几天时间,具体取决于它所需的灵活性和性能.另请注意,某些构造,例如 Kleene 星形或闭包(扩展正则表达式中的 *+)将导致无限列表.支持生成器(例如 C# 的迭代器语法)或协程/延续(例如 Scheme 的 call/cc)或惰性求值(例如 Haskell)的语言可以允许对列表的前几个元素进行迭代,而无需对整个列表求值.或者,选择随机潜在输出而不是穷举迭代可能更适合提供与正则表达式对应的示例.

All in all, completing this could take several days, depending on how flexible and performant it needs to be. Also note that certain constructs, such as Kleene star or closure (* or + in extended regex) will result in an infinite list. A language that supports generators (e.g. C#'s iterator syntax) or coroutines / continuations (e.g. Scheme's call/cc) or lazy evaluation (e.g. Haskell) can permit iteration over the first few elements of the list without having to evaluate the entire list. Alternatively, choosing random potential output rather than exhaustive iteration may be preferable for providing examples corresponding to a regular expression.

这篇关于如何将有限模式扩展为所有可能的匹配项?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆