用于数字范围的正则表达式生成器 [英] a regular expression generator for number ranges

查看:611
本文介绍了用于数字范围的正则表达式生成器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我检查了stackExchange描述,算法问题是允许的主题之一。这样就可以了。

I checked on the stackExchange description, and algorithm questions are one of the allowed topics. So here goes.

给出一个范围的输入,在该范围中,开始数字和结束数字具有相同的数字位数(例如2、3或4),编写代码以生成一组正则表达式,然后依次对一个数字进行检查,告诉我该数字是否在原始范围内。

Given an input of a range, where begin and ending numbers have the same number of digits (say, 2, 3, or 4), I want to write code to generate a set of regular expressions which, when checked against a number in turn, tell me whether that number is in the original range.

例如:if范围是145-387,则146、200和280都将与生成的正则表达式之一匹配,而144、390(用于表示290)和445(用于表示345)将不匹配。

For example: if the range is 145-387, then 146, 200, and 280 would all match one of the regular expressions generated, and 144, 390 (used to say 290), and 445 (used to say 345) would not.

我一直认为结果将是正则表达式的列表,例如:

I have been thinking the result would be a list of regular expressions like:

14[5-9]             // match 145-149
1[5-9]0-9]          // 150-199
2[0-9][0-9]         // 200-299
3[0-7][0-9]         // 300-379
38[0-7]             // 380-387

然后检查该数字的软件将进行测试,以查看所测试的3位代码是否与其中任何一个匹配。

and then software checking the number would test to see if the 3-digit code being tested matched any of these.

那么最好的方法是生成表达式集?

So what's the best way to generate the set of expressions?

我提出的最新(一系列)是:

The latest (in a series) that I've come up with is to:


  1. 确定两个范围数字不同的第一个数字(1145-1158,第一个不同的数字为第三个数字)

  2. 对于不同的数字,确定如果他们的第一位数字相差一个以上-如果是这样,则该范围使用自己的正则表达式(在我们的示例中为200-299)

  3. 以得到较低的范围:彼此为数字:从范围的开头添加第一个数字作为前缀,将数字增加一个,用0填充到相同的长度,并与在数字位置和所有填充位置具有9的数字配对。在我们的示例中,以4到5递增,填充以获取150,生成正则表达式以处理150-199。

  4. 以获得更高的范围:对于其他数字:具有第一个数字的前缀),从范围的末尾开始,将数字递减一位,填充0s,然后在所有填充的0位置和一个递减的数字与一个9s的数字配对。在我们的示例中,可处理300-379的正则表达式。

  1. determine the first digit at which the two range numbers differ (1145-1158, first different digit is the 3rd)
  2. for the different digits, determine if their first digits differ by more than one -- if so, the range for that gets its own regex (200-299 in our example)
  3. to get lower ranges: for each other digit: prefix by the first digit(s) from the beginning of the range, increment the digit by one, pad with 0s to the same length, and pair with a number that has 9 in the digit place and all the padded places. In our example, increment 4 to 5, pad to get 150, generate the regex to handle 150-199.
  4. to get higher ranges: for each other digit: prefix with first digit(s) from end of range, decrement digit by one, pad rest with 0s, pair with a number with 9s in all the padded 0 places and the decremented digit. In our example, the regex to handle 300-379.

我错过了什么吗?即使在上面,我也掩盖了一些细节,似乎可以从算法化的细节中受益匪浅。但是我想出的其他东西甚至比这还要混乱。

Am I missing something? There are details even in the above that I'm glossing over, it seems like something that would benefit from an algorithmic sword slashing through the details. But the other things I've come up with are even messier than this.

推荐答案

这是我的解决方案和复杂度 O(log n)(n是范围的结尾)。我相信这是最简单的方法:

Here's my solution and an algorithm with complexity O(log n) (n is the end of the range). I believe it is the simplest one here:

基本上,将您的任务分为以下步骤:

Basically, split your task into these steps:


  1. 逐渐削弱范围的开始

  2. 逐渐削弱

  3. 合并这两个。

弱是指找到可以由该特定数字的简单正则表达式表示的范围的结尾,例如:

By "weaken", I mean finding the end of range that can be represented by simple regex for this specific number, for example:

145 -> 149,150 -> 199,200 -> 999,1000 -> etc.

这里是倒数,对于结束

387 -> 380,379 -> 300,299 -> 0

合并将是注意到299-> 0和200-> 999和将它们组合成200-> 299。

Merging would be the process of noticing the overlap of 299->0 and 200->999 and combining those into 200->299.

结果,您将获得以下一组数字(第一个完整的列表,第二个反向:

In result, you would get this set of numbers (first list intact, second one inverted:

145, 149, 150, 199, 200, 299, 300, 379, 380, 387

现在,这是有趣的部分。将数字成对,并将其转换为范围:

Now, here is the funny part. Take the numbers in pairs, and convert them to ranges:

145-149, 150-199, 200-299, 300-379, 380-387

或在正则表达式中:

14[5-9], 1[5-9][0-9], 2[0-9][0-9], 3[0-7][0-9], 38[0-7]

以下是弱化的代码的样子:

public static int next(int num) {
    //Convert to String for easier operations
    final char[] chars = String.valueOf(num).toCharArray();
    //Go through all digits backwards
    for (int i=chars.length-1; i>=0;i--) {
        //Skip the 0 changing it to 9. For example, for 190->199
        if (chars[i]=='0') {
            chars[i] = '9';
        } else { //If any other digit is encountered, change that to 9, for example, 195->199, or with both rules: 150->199
            chars[i] = '9';
            break;
        }
    }

    return Integer.parseInt(String.valueOf(chars));
}

//Same thing, but reversed. 387 -> 380, 379 -> 300, etc
public static int prev(int num) {
    final char[] chars = String.valueOf(num).toCharArray();
    for (int i=chars.length-1; i>=0;i--) {
        if (chars[i] == '9') {
            chars[i] = '0';
        } else {
            chars[i] = '0';
            break;
        }
    }

    return Integer.parseInt(String.valueOf(chars));
}

其余都是技术细节,易于实现。这是此 O(log n)算法的实现: https://ideone.com/3SCvZf

The rest is technical details and is easy to implement. Here's an implementation of this O(log n) algorithm: https://ideone.com/3SCvZf

哦,顺便说一下,它也适用于其他范围,例如范围 1-321654 结果为:

Oh, and by the way, it works with other ranges too, for example for range 1-321654 the result is:

[1-9]
[1-9][0-9]
[1-9][0-9][0-9]
[1-9][0-9][0-9][0-9]
[1-9][0-9][0-9][0-9][0-9]
[1-2][0-9][0-9][0-9][0-9][0-9]
3[0-1][0-9][0-9][0-9][0-9]
320[0-9][0-9][0-9]
321[0-5][0-9][0-9]
3216[0-4][0-9]
32165[0-4]

对于 129-131 ,它是:

129
13[0-1]

这篇关于用于数字范围的正则表达式生成器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆