函数创建匹配数字范围的正则表达式 [英] Function To Create Regex Matching a Number Range

查看:146
本文介绍了函数创建匹配数字范围的正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Amazon Mechanical Turk API,它只允许我使用正则表达式来过滤数据字段。



我想输入一个整数范围的函数,如256-311或45-1233,并返回一个正则表达式,只匹配该范围。



匹配256-321的正则表达式应该是:

  \ b((25 6-9])|(2 [6-9] [0-9])|(3 [0-1] [0-9])|(32 [0-1]))\ b 

这部分相当简单,但我在创建此正则表达式时遇到了问题。



我试图建立一个像这样定义的函数:

 函数getRangeRegex(int fromInt,int toInt)
{

return regexString;
}

我在网上看了很多,我很惊讶它看起来不像就像任何人在过去解决过这个问题一样。这是一个困难的问题...



感谢您的宝贵时间。

解决方案


$ b

<?php

函数regex_range($ from,$ to){
$ b $ if($ from <0 || $ to <0){
throw new Exception(Negative values not supported) ;
}

if($ from> $ to){
抛出新的异常(Invalid range $ from .. $ to,from> to);
}

$ ranges = array($ from);
$ increment = 1;
$ next = $ from;
$ higher = true;

while(true){

$ next + = $ increment;
$ b $ if if($ next + $ increment> $ to){
if($ next <= $ to){
$ ranges [] = $ next;
}
$ increment / = 10;
$ higher = false;
}
else if($ next%($ increment * 10)=== 0){
$ ranges [] = $ next;
$增值= $更高? $增量* 10:$增量/ 10;
}

if(!$ higher&& $ increment< 10){
break;
}
}

$ ranges [] = $ to + 1;

$ regex ='/ ^(?:';

for($ i = 0; $ i< sizeof($ ranges) - 1; $ i ++){
$ str_from =(string)($ ranges [$ i]);
$ str_to =(string)($ ranges [$ i + 1] - 1);

for($ j = 0; $ j< strlen($ str_from); $ j ++){
if($ str_from [$ j] == $ str_to [$ j]){
$ regex。 = $ str_from [$ j];
}
else {
$ regex。=[。$ str_from [$ j]。 - 。$ str_to [$ j]。 ];
}
}
$ regex。=|;
}

返回substr($ regex,0,strlen($ regex )-1)。')$ /';
}

函数test($ from,$ to){
try {
printf(% - 10s%s \ n,$ from。' - '。$ to,regex_range($ from,$ to));
} catch(Exception $ e){
echo $ e-> getMessage()。 \\\
;
}
}

test(2,8);
test(5,35);
test(5,100);
test(12,1234);
test(123,123);
test(256,321);
test(256,257);
test(180,195);
test(2,1);
test(-2,4);

?>

产生:

 5-35 / ^(?:[5-9] | [ ([1-2]] [0-9] | 3 [0-5])$ / 
5-100 / ^(?:[5-9] | [1-9] (?: 1 [2-9] | [2-9] [0-9] | [1-9] [0-9] [0-9] | 1 [$ /
12-1234 / 0-2] [0-3] [0-4])$ /
123-123 / ^(?: 123)$ /
256-321 / ^(?: 25 [6-9 ] | 2 [6-9] [0-9] | 3 [0-2] [0-1])$ ​​/
256-257 / ^(?: 256 | 257)$ /
180-195 / ^(?: 18 [0-9] | 19 [0-5])$ /
无效范围2..1,来自>至
不支持负值

没有经过适当测试,请您自担风险! p>

是的,生成的正则表达式在很多情况下可以写得更紧凑,但是我把它作为读者的练习:)

I am working with the Amazon Mechanical Turk API and it will only allow me to use regular expressions to filter a field of data.

I would like to input an integer range to a function, such as 256-311 or 45-1233, and return a regex that would match only that range.

A regex matching 256-321 would be:

\b((25[6-9])|(2[6-9][0-9])|(3[0-1][0-9])|(32[0-1]))\b

That part is fairly easy, but I am having trouble with the loop to create this regex.

I am trying to build a function defined like this:

function getRangeRegex( int fromInt, int toInt)
{

      return regexString;
}

I looked all over the web and I am surprised that it doesn't look like anyone has solved this in the past. It is a difficult problem...

Thanks for your time.

解决方案

Here's a quick hack:

<?php

function regex_range($from, $to) {

  if($from < 0 || $to < 0) {
    throw new Exception("Negative values not supported"); 
  }

  if($from > $to) {
    throw new Exception("Invalid range $from..$to, from > to"); 
  }

  $ranges = array($from);
  $increment = 1;
  $next = $from;
  $higher = true;

  while(true) {

    $next += $increment;

    if($next + $increment > $to) {
      if($next <= $to) {
        $ranges[] = $next;
      }
      $increment /= 10;
      $higher = false;
    }
    else if($next % ($increment*10) === 0) {
      $ranges[] = $next;
      $increment = $higher ? $increment*10 : $increment/10;
    }

    if(!$higher && $increment < 10) {
      break;
    }
  }

  $ranges[] = $to + 1;

  $regex = '/^(?:';

  for($i = 0; $i < sizeof($ranges) - 1; $i++) {
    $str_from = (string)($ranges[$i]);
    $str_to = (string)($ranges[$i + 1] - 1);

    for($j = 0; $j < strlen($str_from); $j++) {
      if($str_from[$j] == $str_to[$j]) {
        $regex .= $str_from[$j];
      }
      else {
        $regex .= "[" . $str_from[$j] . "-" . $str_to[$j] . "]";
      }
    }
    $regex .= "|";
  }

  return substr($regex, 0, strlen($regex)-1) . ')$/';
}

function test($from, $to) {
  try {
    printf("%-10s %s\n", $from . '-' . $to, regex_range($from, $to));
  } catch (Exception $e) {
    echo $e->getMessage() . "\n";
  }
}

test(2, 8);
test(5, 35);
test(5, 100);
test(12, 1234);
test(123, 123);
test(256, 321);
test(256, 257);
test(180, 195);
test(2,1);
test(-2,4);

?>

which produces:

2-8        /^(?:[2-7]|8)$/
5-35       /^(?:[5-9]|[1-2][0-9]|3[0-5])$/
5-100      /^(?:[5-9]|[1-9][0-9]|100)$/
12-1234    /^(?:1[2-9]|[2-9][0-9]|[1-9][0-9][0-9]|1[0-2][0-3][0-4])$/
123-123    /^(?:123)$/
256-321    /^(?:25[6-9]|2[6-9][0-9]|3[0-2][0-1])$/
256-257    /^(?:256|257)$/
180-195    /^(?:18[0-9]|19[0-5])$/
Invalid range 2..1, from > to
Negative values not supported

Not properly tested, use at your own risk!

And yes, the generated regex could be written more compact in many cases, but I leave that as an exercise for the reader :)

这篇关于函数创建匹配数字范围的正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆