使用正则表达式过滤并返回匹配的数字 [英] filtering using regex and returning the matched number

查看:55
本文介绍了使用正则表达式过滤并返回匹配的数字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在这里,我尝试使用正则表达式从文本中过滤特定的电话号码.电话名称可能有这样的漏洞.

Here i am trynig to filter the specific phone numbers from text using regex. Phone name may have exploits like this.

4023one345233 应该被认为是 40231345233 然后应该被过滤.

4023one345233 should be considered as 40231345233 and then should be filtered.

此代码在没有漏洞的情况下运行良好:

This code works fine without exploits:

代码 1:

$arrwords = array(0=>'zero',1=>'one',2=>'two',3=>'three',4=>'four',5=>'five',6=>'six',7=>'seven',8=>'eight',9=>'nine');
preg_match_all('/[A-za-z]+/', $text, $matches);
$arr=$matches[0];
foreach($arr as $v)
{
    $v = strtolower($v);
    if(in_array($v,$arrwords))
    {
        $text= str_replace($v,array_search($v,$arrwords),$text);
    }
}
foreach ($words as $word){

    $pattern = '/^(?=.{8,14})b$\(?(?:(?:0(?:0|11)\)?[\s-]?\(?|\+)44\)?[\s-]?\(?(?:0\)?[\s-]?\(?)?|0)(?:\d{2}\)?[\s-]?\d{4}[\s-]?\d{4}|\d{3}\)?[\s-]?\d{3}[\s-]?\d{3,4}|\d{4}\)?[\s-]?(?:\d{5}|\d{3}[\s-]?\d{3})|\d{5}\)?[\s-]?\d{4,5}|8(?:00[\s-]?11[\s-]?11|45[\s-]?46[\s-]?4\d))(?:(?:[\s-]?(?:x|ext\.?\s?|\#)\d+)?)$^|^2(?:0[01378]|3[0189]|4[017]|8[0-46-9]|9[012])\d{7}|1(?:(?:1(?:3[0-48]|[46][0-4]|5[012789]|7[0-49]|8[01349])|21[0-7]|31[0-8]|[459]1\d|61[0-46-9]))\d{6}|1(?:2(?:0[024-9]|2[3-9]|3[3-79]|4[1-689]|[58][02-9]|6[0-4789]|7[013-9]|9\d)|3(?:0\d|[25][02-9]|3[02-579]|[468][0-46-9]|7[1235679]|9[24578])|4(?:0[03-9]|2[02-5789]|[37]\d|4[02-69]|5[0-8]|[69][0-79]|8[0-5789])|5(?:0[1235-9]|2[024-9]|3[0145689]|4[02-9]|5[03-9]|6\d|7[0-35-9]|8[0-468]|9[0-5789])|6(?:0[034689]|2[0-689]|[38][013-9]|4[1-467]|5[0-69]|6[13-9]|7[0-8]|9[0124578])|7(?:0[0246-9]|2\d|3[023678]|4[03-9]|5[0-46-9]|6[013-9]|7[0-35-9]|8[024-9]|9[02-9])|8(?:0[35-9]|2[1-5789]|3[02-578]|4[0-578]|5[124-9]|6[2-69]|7\d|8[02-9]|9[02569])|9(?:0[02-589]|2[02-689]|3[1-5789]|4[2-9]|5[0-579]|6[234789]|7[0124578]|8\d|9[2-57]))\d{6}|1(?:2(?:0(?:46[1-4]|87[2-9])|545[1-79]|76(?:2\d|3[1-8]|6[1-6])|9(?:7(?:2[0-4]|3[2-5])|8(?:2[2-8]|7[0-4789]|8[345])))|3(?:638[2-5]|647[23]|8(?:47[04-9]|64[015789]))|4(?:044[1-7]|20(?:2[23]|8\d)|6(?:0(?:30|5[2-57]|6[1-8]|7[2-8])|140)|8(?:052|87[123]))|5(?:24(?:3[2-79]|6\d)|276\d|6(?:26[06-9]|686))|6(?:06(?:4\d|7[4-79])|295[567]|35[34]\d|47(?:24|61)|59(?:5[08]|6[67]|74)|955[0-4])|7(?:26(?:6[13-9]|7[0-7])|442\d|50(?:2[0-3]|[3-68]2|76))|8(?:27[56]\d|37(?:5[2-5]|8[239])|84(?:3[2-58]))|9(?:0(?:0(?:6[1-8]|85)|52\d)|3583|4(?:66[1-8]|9(?:2[01]|81))|63(?:23|3[1-4])|9561))\d{3}|176888[234678]\d{2}|16977[23]\d{3}|7(?:[1-4]\d\d|5(?:0[0-8]|[13-9]\d|2[0-35-9])|624|7(?:0[1-9]|[1-7]\d|8[02-9]|9[0-689])|8(?:[014-9]\d|[23][0-8])|9(?:[04-9]\d|1[02-9]|2[0-35-9]|3[0-689]))\d{6}|76(?:0[012]|2[356]|4[0134]|5[49]|6[0-369]|77|81|9[39])\d{6}|80(?:0\d{6,7}|8\d{7})|500\d{6}|(?:87[123]|9(?:[01]\d|8[0-3]))\d{7}|8(?:4[2-5]|70)\d{7}|70\d{8}|56\d{8}|(?:3[0347]|55)\d{8}|8(?:001111|45464\d)$|(?:\((\+?\d+)?\)|(\+\d{0,3}))? ?\d{2,3}([-\.]?\d{2,3} ?){3,4}/';
    preg_match_all($pattern, $text, $matches, PREG_OFFSET_CAPTURE );            
    $this->pushToResultSet($matches);
}

从 SO 帮助中,我可以找到这段代码,该代码使用上述漏洞过滤数字.

From SO help I could reach this code which filters number with exploits as mentioned above.

http://ideone.com/8UW22U - 测试链接

代码 2:

$arrwords = array_flip(array(0=>'zero',1=>'one',2=>'two',3=>'three',4=>'four',5=>'five',6=>'six',7=>'seven',8=>'eight',9=>'nine'));

$s = "my long STRING with some Numbers 402three1345233 4023one345233";

$sanitised = array();    
foreach (explode(" ", $s) as $word) {
    $num = strtr(strtolower($word), $arrwords);
    $sanitised[] = is_numeric($num) ? str_repeat("*", strlen($word)) : $word;        
}

echo implode(" ", $sanitised);

但是和我的第一个代码一样,我只想在找到数字后匹配模式然后返回匹配模式

But as in my first code, I just want to match the pattern after finding the number and then returning matched pattern

这里我尝试将代码 2 移植到代码 1 中.

Here I have tried to port code 2 in code 1.

foreach (explode(" ", $s) as $word) {
    $num = strtr(strtolower($word), $arrwords);
    if(is_numeric($num)){ 
         $pattern = 'regex_above';
        preg_match_all($pattern, <$text?????>, $matches, PREG_OFFSET_CAPTURE );            
        $this->pushToResultSet($matches);

    }
}

有人可以帮忙纠正这个问题吗?

can some one help to correct this?

注意:请注意,原始号码的长度和匹配的图案的长度应该相同.表示 4023three345233 应该匹配为 **************** 而不是 ***********

Note : Please see, the lenght of origianl number and matched pattern should be same. Means 4023three345233 should be matched as **************** not ***********

推荐答案

如果我正确理解您的问题,您想用星号替换一串数字(可能包含书面数字).星号的数量必须等于字符串中的字符数.

If I understand your question correctly you want to replace a string of numbers (possibly containing written numbers) with asterisks. The number of asterisks must be equal to the number of characters in the string.

在下面的代码中,正则表达式匹配包含 3 到 7 个数字的字符串.

In the code below, the regex matches strings that contain 3 to 7 numbers.

$s = "123 onetwothree 1two3 one dog";
$new_words = array();
$numbers = array();
$pattern = "#(\d|zero|one|two|three|four|five|six|seven|eight|nine){3,7}#i";
foreach(explode(" ", $s) as $word) {
    if(preg_match($pattern, $word, $matches)) {
        $new_words[] = str_repeat("*", strlen($word));
        $numbers[] = $matches[0];
    } else {
        $new_words[] = $word;
    }
}

$new_s = implode(" ", $new_words);
print $new_s . "\n";
print implode(" ", $numbers) . "\n";

给出:

*** *********** ***** one dog
123 onetwothree 1two3

您代码中的正则表达式非常长,向正则表达式添加零|一个|..."对您来说可能不可行.另一种解决方案可能是:

The regex in your code is extremely long and adding 'zero|one|...' to the regex might not be feasible for you. Another solution could be to:

  • 获取字符串中每个单词的字符数:$word_lengths
  • 用数字值替换书写的数字.例如一"变成1"
  • 匹配你的长正则表达式
  • 如果匹配,则根据$word_lengths创建一串星号

这篇关于使用正则表达式过滤并返回匹配的数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆