确定正则表达式是否是另一个的子集 [英] Determining whether a regex is a subset of another
问题描述
我有大量的正则表达式集合,当匹配时调用特定的http处理程序.某些较旧的正则表达式无法访问(例如a.c* ⊃ abc*
),我希望对它们进行修剪.
I have a large collection of regular expression that when matched call a particular http handler. Some of the older regex's are unreachable (e.g. a.c* ⊃ abc*
) and I'd like to prune them.
是否有一个提供两个正则表达式的库会告诉我第二个是否是第一个的子集?
Is there a library that given two regex's will tell me if the second is subset of the first?
我一开始不确定这是可以决定的(它闻起来像是停顿问题,换了个名字).但是事实证明这是可以决定的.
I wasn't sure this was decidable at first (it smelled like the halting problem by a different name). But it turns out it's decidable.
推荐答案
问题的正式定义可以在以下范围内找到:通常称为包含问题
The formal definition of the problem can be found within: this is generally called the inclusion problem
R的包含问题是测试两个给定的表达式r,r′∈R, 是否r⊆r'.
The inclusion problem for R, is to test for two given expressions r, r′ ∈ R, whether r ⊆ r′.
该论文提供了一些很好的信息(摘要:除最简单的表达式外,其他所有表达式都相当复杂),但是,搜索有关包含问题的信息会使人们直接回到描述可传递的多项式时间算法的论文,该论文应涵盖很多常见的情况.
That paper has some great information (summary: all but the simplest expressions are fairly complex), however searching for information on the inclusion problem leads one directly back to StackOverflow. That answer already had a link to a paper describing a passable polynomial time algorithm which should cover a lot of common cases.
这篇关于确定正则表达式是否是另一个的子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!