从数字范围生成正则表达式 [英] generate a regexp from a numeric range
问题描述
我想从一个数字范围生成一个(系列)正则表达式.
示例:
1013 - 4044 =>正则表达式匹配-------------------------------101[3-9] 1013 - 101910[2-9][0-9] 1020 - 109911[0-9][0-9] 1100 - 1199[23][0-9][0-9][0-9] 2000 - 399940[0-3][0-9] 4000 - 4039404[0-4] 4040 - 4044
最简单的算法是什么?
反转它的最简单方法是什么(即给定正则表达式,寻找范围)?
很高兴看到 java、clojure、perl 中的解决方案...
谢谢!
有一个 在线工具生成给定范围的正则表达式,并提供解释.您也可以在那里找到源代码.例如:
^(101[3-9]|10[2-9][0-9]|1[1-9][0-9]{2}|[23][0-9]{3}|40[0-3][0-9]|404[0-4])$
<前>首先,分成相等的长度范围:1013 - 4044其次,分解产生简单正则表达式的范围:1013 - 10191020 - 10991100 - 19992000 - 39994000 - 40394040 - 4044将每个范围变成一个正则表达式:101[3-9]10[2-9][0-9]1[1-9][0-9]{2}[23][0-9]{3}40[0-3][0-9]404[0-4]折叠 10 的相邻幂:101[3-9]10[2-9][0-9]1[1-9][0-9]{2}[23][0-9]{3}40[0-3][0-9]404[0-4]结合上面的正则表达式产生:(101[3-9]|10[2-9][0-9]|1[1-9][0-9]{2}|[23][0-9]{3}|40[0-3][0-9]|404[0-4])接下来,我们将尝试使用树来分解常见前缀:根据正则表达式前缀解析成树:.1 0 1 [3-9]+ [2-9] [0-9]+ [1-9] [0-9]{2}+ [23] [0-9]{3}+ 4 0 [0-3] [0-9]+ 4 [0-4]将解析树转换为正则表达式产生:(1(0(1[3-9]|[2-9][0-9])|[1-9][0-9]{2})|[23][0-9]{3}|40([0-3][0-9]|4[0-4]))我们选择较短的一个作为我们的结果.^(101[3-9]|10[2-9][0-9]|1[1-9][0-9]{2}|[23][0-9]{3}|40[0-3][0-9]|404[0-4])$
<小时>
要反转它,您可以查看字符类,并获得每个选项的最小值和最大值.
^(101[3-9]|10[2-9][0-9]|1[1-9][0-9]{2}|[23][0-9]{3}|40[0-3][0-9]|404[0-4])$=>1013 1020 1100 2000 4000 4040 降低1019 1999 1199 3999 4039 4044 鞋面=>1013 - 4044
I'd like to generate a (series of) regexp(s) from a numeric range.
Example:
1013 - 4044 =>
regexp matches
---------------------------------------
101[3-9] 1013 - 1019
10[2-9][0-9] 1020 - 1099
11[0-9][0-9] 1100 - 1199
[23][0-9][0-9][0-9] 2000 - 3999
40[0-3][0-9] 4000 - 4039
404[0-4] 4040 - 4044
what is the simplest algorithm?
What is the easiest way to reverse it (i.e. given the regexps, looking for the ranges)?
Would be nice to see solutions in java, clojure, perl...
Thanks!
There is an online tool for generating regex given a range, and provides the explanation. You can find the source code there also. For example:
^(101[3-9]|10[2-9][0-9]|1[1-9][0-9]{2}|[23][0-9]{3}|40[0-3][0-9]|404[0-4])$
First, break into equal length ranges: 1013 - 4044 Second, break into ranges that yield simple regexes: 1013 - 1019 1020 - 1099 1100 - 1999 2000 - 3999 4000 - 4039 4040 - 4044 Turn each range into a regex: 101[3-9] 10[2-9][0-9] 1[1-9][0-9]{2} [23][0-9]{3} 40[0-3][0-9] 404[0-4] Collapse adjacent powers of 10: 101[3-9] 10[2-9][0-9] 1[1-9][0-9]{2} [23][0-9]{3} 40[0-3][0-9] 404[0-4] Combining the regexes above yields: (101[3-9]|10[2-9][0-9]|1[1-9][0-9]{2}|[23][0-9]{3}|40[0-3][0-9]|404[0-4]) Next we'll try factoring out common prefixes using a tree: Parse into tree based on regex prefixes: . 1 0 1 [3-9] + [2-9] [0-9] + [1-9] [0-9]{2} + [23] [0-9]{3} + 4 0 [0-3] [0-9] + 4 [0-4] Turning the parse tree into a regex yields: (1(0(1[3-9]|[2-9][0-9])|[1-9][0-9]{2})|[23][0-9]{3}|40([0-3][0-9]|4[0-4])) We choose the shorter one as our result. ^(101[3-9]|10[2-9][0-9]|1[1-9][0-9]{2}|[23][0-9]{3}|40[0-3][0-9]|404[0-4])$
To reverse it, you can look at the character classes, and get the minimum and maximum for each alternative.
^(101[3-9]|10[2-9][0-9]|1[1-9][0-9]{2}|[23][0-9]{3}|40[0-3][0-9]|404[0-4])$
=> 1013 1020 1100 2000 4000 4040 lowers
1019 1999 1199 3999 4039 4044 uppers
=> 1013 - 4044
这篇关于从数字范围生成正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!