从通配符到正则表达式 [英] From wildcards to regular expressions

查看:150
本文介绍了从通配符到正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想允许两个主要通配符 * 来过滤我的数据。 b
$ b

以下是我现在正在做的事情(正如我在许多网站上看到的那样): b


$ b

  public boolean contains(String data,String filter){
if(data == null || data.isEmpty()){
返回false;

字符串regex = filter.replace(。,[。])
.replace(?,。)
.replace(* ,。*);
返回Pattern.matches(regex,data);





$ b

但是我们不应该转义所有其他的正则表达式特殊字符,比如 | 或者等等?也可以保存 \ ,例如, 和 * b


  filter.replaceAll(([$ | \\ [\ \] {}(),。+ ^  - ]),\\\ $ 1)// 1.转义正则表达式特殊字符,但?,*和\ $​​ b $ b .replaceAll (([^ \\\] | ^)\\,$ 1。)// 2.替换之前没有的任何一个\ by。
.replaceAll(([^ \\\] | ^)\\ *,$ 1. *)// 3.替换前面没有\的任何* 。*
.replaceAll(\\\\([^?*] | $),\\\\\\\\\ $ 1); //替换任何不是由一个或一个*(可能是d如果你觉得这样做,你会怎么想呢?如果你同意,我错过任何其他的正则表达式特殊字符?






编辑#1 考虑到dan1111's和m.buettner的建议):


$ b

//用一个*
替换任意偶数的反斜杠regex = regex.replaceAll((?<!\\\\)(\\\\\\\\\\\ \\\)+(?!\\\\\),*);
//减少不在前面的冗余通配符\
regex = regex.replaceAll((?<!\\\\)[?] * [*] [*?] +,*);
//转义正则表达式特殊字符,但是\,?和*
regex = regex.replaceAll(([| \\ [\\] {}(),。^ $ + - ]),\\\\\ $ 1 );
//替换?不是由\之前。
regex = regex.replaceAll((?<!\\\\)[?],。);
//替换*之前没有的一个\。*
regex = regex.replaceAll((?<!\\\\)[*], ) *;

这个怎么样?




编辑#2 (考虑到dan1111的建议后):



//用一个* $ b $替换任意偶数的反斜杠regex = regex.replaceAll((?<!\\ (\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\')(
//减少不在前面的冗余通配符\
regex = regex.replaceAll((?<!\\\\)[?] * [*] [*?] +,*);
//转义正则表达式特殊字符(如果用户还没有转义),但是\,?和*
regex = regex.replaceAll((?<!\\\\)([| \\ [\\] {}(),。^ $ + ]),\\\ $ 1);
//替换?不是由\之前。
regex = regex.replaceAll((?<!\\\\)[?],。);
//替换*之前没有的一个\。*
regex = regex.replaceAll((?<!\\\\)[*], ) *;

目标在望?

解决方案

下面是我通过的解决方案(使用 Apache Commons Lang 库):


$ b

public static boolean isFiltered(String data,String过滤器){
//无过滤器:返回true
if(StringUtils.isBlank(filter)){
return true;

//过滤但是没有数据:返回false
else if(StringUtils.isBlank(data)){
return false;

//过滤器和数据:
else {
//不区分大小写
data = data.toLowerCase();
filter = filter.toLowerCase();
// .matches()auto-anchors,所以加上[*](即contains)
字符串regex =*+ filter +*;
//将任意一对反斜杠替换为[*]
regex = regex.replaceAll((?<!\\\\)(\\\\\ \\\+++++++++++++++++++++++++++++++++++++++++++++++++++++++)
//最小化非冗余通配符
regex = regex.replaceAll((?<!\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\' *);
//转义正则表达式特殊字符,但[\],[?]和[*]
regex = regex.replaceAll((?<!\\\\) ([| \\ [\\] {}(),。^ $ + - ]),\\\\\ $ 1);
//用[。]替换未转义的[?]
regex = regex.replaceAll((?<!\\\\)[?],。);
//用[。*]替换未转义的[*]
regex = regex.replaceAll((?<!\\\\)[*],。* );
//返回数据是否匹配正则表达式
返回data.matches(regex);




$ b

非常感谢@ dan1111和@ m.buettner for他们的宝贵帮助;)


I want to allow the two main wildcards ? and * to filter my data.

Here is how I'm doing now (as I saw on many websites):

public boolean contains(String data, String filter) {
    if(data == null || data.isEmpty()) {
        return false;
    }
    String regex = filter.replace(".", "[.]")
                         .replace("?", ".")
                         .replace("*", ".*");
    return Pattern.matches(regex, data);
}

But shouldn't we escape all the other regex special chars, like | or (, etc.? And also, maybe we could preserve ? and * if they are preceded by a \? For example, something like:

filter.replaceAll("([$|\\[\\]{}(),.+^-])", "\\\\$1") // 1. escape regex special chars, but ?, * and \
      .replaceAll("([^\\\\]|^)\\?", "$1.")           // 2. replace any ? that isn't preceded by a \ by .
      .replaceAll("([^\\\\]|^)\\*", "$1.*")          // 3. replace any * that isn't preceded by a \ by .*
      .replaceAll("\\\\([^?*]|$)", "\\\\\\\\$1");    // 4. replace any \ that isn't followed by a ? or a * (possibly due to step 2 and 3) by \\

What do you think about it? If you agree, am I missing any other regex special char?


Edit #1 (after having taken into account dan1111's and m.buettner's advices):

// replace any even number of backslashes by a *
regex = regex.replaceAll("(?<!\\\\)(\\\\\\\\)+(?!\\\\)", "*");
// reduce redundant wildcards that aren't preceded by a \
regex = regex.replaceAll("(?<!\\\\)[?]*[*][*?]+", "*");
// escape regexps special chars, but \, ? and *
regex = regex.replaceAll("([|\\[\\]{}(),.^$+-])", "\\\\$1");
// replace ? that aren't preceded by a \ by .
regex = regex.replaceAll("(?<!\\\\)[?]", ".");
// replace * that aren't preceded by a \ by .*
regex = regex.replaceAll("(?<!\\\\)[*]", ".*");

What about this one?


Edit #2 (after having taken into account dan1111's advices):

// replace any even number of backslashes by a *
regex = regex.replaceAll("(?<!\\\\)(\\\\\\\\)+(?!\\\\)", "*");
// reduce redundant wildcards that aren't preceded by a \
regex = regex.replaceAll("(?<!\\\\)[?]*[*][*?]+", "*");
// escape regexps special chars (if not already escaped by user), but \, ? and *
regex = regex.replaceAll("(?<!\\\\)([|\\[\\]{}(),.^$+-])", "\\\\$1");
// replace ? that aren't preceded by a \ by .
regex = regex.replaceAll("(?<!\\\\)[?]", ".");
// replace * that aren't preceded by a \ by .*
regex = regex.replaceAll("(?<!\\\\)[*]", ".*");

Goal in sight?

解决方案

Here is finally the solution I adopted (using the Apache Commons Lang library):

public static boolean isFiltered(String data, String filter) {
    // no filter: return true
    if (StringUtils.isBlank(filter)) {
        return true;
    }
    // a filter but no data: return false
    else if (StringUtils.isBlank(data)) {
        return false;
    }
    // a filter and a data:
    else {
        // case insensitive
        data = data.toLowerCase();
        filter = filter.toLowerCase();
        // .matches() auto-anchors, so add [*] (i.e. "containing")
        String regex = "*" + filter + "*";
        // replace any pair of backslashes by [*]
        regex = regex.replaceAll("(?<!\\\\)(\\\\\\\\)+(?!\\\\)", "*");
        // minimize unescaped redundant wildcards
        regex = regex.replaceAll("(?<!\\\\)[?]*[*][*?]+", "*");
        // escape unescaped regexps special chars, but [\], [?] and [*]
        regex = regex.replaceAll("(?<!\\\\)([|\\[\\]{}(),.^$+-])", "\\\\$1");
        // replace unescaped [?] by [.]
        regex = regex.replaceAll("(?<!\\\\)[?]", ".");
        // replace unescaped [*] by [.*]
        regex = regex.replaceAll("(?<!\\\\)[*]", ".*");
        // return whether data matches regex or not
        return data.matches(regex);
    }
}

Many thanks to @dan1111 and @m.buettner for their precious help ;)

这篇关于从通配符到正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆