从通配符到正则表达式 [英] From wildcards to regular expressions
问题描述
我想允许两个主要通配符?
和 *
来过滤我的数据。 b
$ b
以下是我现在正在做的事情(正如我在许多网站上看到的那样): b
$ b
public boolean contains(String data,String filter){
if(data == null || data.isEmpty()){
返回false;
字符串regex = filter.replace(。,[。])
.replace(?,。)
.replace(* ,。*);
返回Pattern.matches(regex,data);
$ b 但是我们不应该转义所有其他的正则表达式特殊字符,比如 |
或者(
等等?也可以保存?$ c $如果它们前面有一个 \
,例如, 和 *
b
filter.replaceAll(([$ | \\ [\ \] {}(),。+ ^ - ]),\\\ $ 1)// 1.转义正则表达式特殊字符,但?,*和\ $ b $ b .replaceAll (([^ \\\] | ^)\\,$ 1。)// 2.替换之前没有的任何一个\ by。
.replaceAll(([^ \\\] | ^)\\ *,$ 1. *)// 3.替换前面没有\的任何* 。*
.replaceAll(\\\\([^?*] | $),\\\\\\\\\ $ 1); //替换任何不是由一个或一个*(可能是d如果你觉得这样做,你会怎么想呢?如果你同意,我错过任何其他的正则表达式特殊字符?
编辑#1 考虑到dan1111's和m.buettner的建议):
$ b
//用一个*
替换任意偶数的反斜杠regex = regex.replaceAll((?<!\\\\)(\\\\\\\\\\\ \\\)+(?!\\\\\),*);
//减少不在前面的冗余通配符\
regex = regex.replaceAll((?<!\\\\)[?] * [*] [*?] +,*);
//转义正则表达式特殊字符,但是\,?和*
regex = regex.replaceAll(([| \\ [\\] {}(),。^ $ + - ]),\\\\\ $ 1 );
//替换?不是由\之前。
regex = regex.replaceAll((?<!\\\\)[?],。);
//替换*之前没有的一个\。*
regex = regex.replaceAll((?<!\\\\)[*], ) *;
这个怎么样?
编辑#2 (考虑到dan1111的建议后):
//用一个* $ b $替换任意偶数的反斜杠regex = regex.replaceAll((?<!\\ (\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\')(
//减少不在前面的冗余通配符\
regex = regex.replaceAll((?<!\\\\)[?] * [*] [*?] +,*);
//转义正则表达式特殊字符(如果用户还没有转义),但是\,?和*
regex = regex.replaceAll((?<!\\\\)([| \\ [\\] {}(),。^ $ + ]),\\\ $ 1);
//替换?不是由\之前。
regex = regex.replaceAll((?<!\\\\)[?],。);
//替换*之前没有的一个\。*
regex = regex.replaceAll((?<!\\\\)[*], ) *;
目标在望?
下面是我通过的解决方案(使用 Apache Commons Lang 库):
$ b
public static boolean isFiltered(String data,String过滤器){
//无过滤器:返回true
if(StringUtils.isBlank(filter)){
return true;
//过滤但是没有数据:返回false
else if(StringUtils.isBlank(data)){
return false;
//过滤器和数据:
else {
//不区分大小写
data = data.toLowerCase();
filter = filter.toLowerCase();
// .matches()auto-anchors,所以加上[*](即contains)
字符串regex =*+ filter +*;
//将任意一对反斜杠替换为[*]
regex = regex.replaceAll((?<!\\\\)(\\\\\ \\\+++++++++++++++++++++++++++++++++++++++++++++++++++++++)
//最小化非冗余通配符
regex = regex.replaceAll((?<!\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\' *);
//转义正则表达式特殊字符,但[\],[?]和[*]
regex = regex.replaceAll((?<!\\\\) ([| \\ [\\] {}(),。^ $ + - ]),\\\\\ $ 1);
//用[。]替换未转义的[?]
regex = regex.replaceAll((?<!\\\\)[?],。);
//用[。*]替换未转义的[*]
regex = regex.replaceAll((?<!\\\\)[*],。* );
//返回数据是否匹配正则表达式
返回data.matches(regex);
$ b 非常感谢@ dan1111和@ m.buettner for他们的宝贵帮助;)
I want to allow the two main wildcards ?
and *
to filter my data.
Here is how I'm doing now (as I saw on many websites):
public boolean contains(String data, String filter) {
if(data == null || data.isEmpty()) {
return false;
}
String regex = filter.replace(".", "[.]")
.replace("?", ".")
.replace("*", ".*");
return Pattern.matches(regex, data);
}
But shouldn't we escape all the other regex special chars, like |
or (
, etc.? And also, maybe we could preserve ?
and *
if they are preceded by a \
? For example, something like:
filter.replaceAll("([$|\\[\\]{}(),.+^-])", "\\\\$1") // 1. escape regex special chars, but ?, * and \
.replaceAll("([^\\\\]|^)\\?", "$1.") // 2. replace any ? that isn't preceded by a \ by .
.replaceAll("([^\\\\]|^)\\*", "$1.*") // 3. replace any * that isn't preceded by a \ by .*
.replaceAll("\\\\([^?*]|$)", "\\\\\\\\$1"); // 4. replace any \ that isn't followed by a ? or a * (possibly due to step 2 and 3) by \\
What do you think about it? If you agree, am I missing any other regex special char?
Edit #1 (after having taken into account dan1111's and m.buettner's advices):
// replace any even number of backslashes by a *
regex = regex.replaceAll("(?<!\\\\)(\\\\\\\\)+(?!\\\\)", "*");
// reduce redundant wildcards that aren't preceded by a \
regex = regex.replaceAll("(?<!\\\\)[?]*[*][*?]+", "*");
// escape regexps special chars, but \, ? and *
regex = regex.replaceAll("([|\\[\\]{}(),.^$+-])", "\\\\$1");
// replace ? that aren't preceded by a \ by .
regex = regex.replaceAll("(?<!\\\\)[?]", ".");
// replace * that aren't preceded by a \ by .*
regex = regex.replaceAll("(?<!\\\\)[*]", ".*");
What about this one?
Edit #2 (after having taken into account dan1111's advices):
// replace any even number of backslashes by a *
regex = regex.replaceAll("(?<!\\\\)(\\\\\\\\)+(?!\\\\)", "*");
// reduce redundant wildcards that aren't preceded by a \
regex = regex.replaceAll("(?<!\\\\)[?]*[*][*?]+", "*");
// escape regexps special chars (if not already escaped by user), but \, ? and *
regex = regex.replaceAll("(?<!\\\\)([|\\[\\]{}(),.^$+-])", "\\\\$1");
// replace ? that aren't preceded by a \ by .
regex = regex.replaceAll("(?<!\\\\)[?]", ".");
// replace * that aren't preceded by a \ by .*
regex = regex.replaceAll("(?<!\\\\)[*]", ".*");
Goal in sight?
解决方案 Here is finally the solution I adopted (using the Apache Commons Lang library):
public static boolean isFiltered(String data, String filter) {
// no filter: return true
if (StringUtils.isBlank(filter)) {
return true;
}
// a filter but no data: return false
else if (StringUtils.isBlank(data)) {
return false;
}
// a filter and a data:
else {
// case insensitive
data = data.toLowerCase();
filter = filter.toLowerCase();
// .matches() auto-anchors, so add [*] (i.e. "containing")
String regex = "*" + filter + "*";
// replace any pair of backslashes by [*]
regex = regex.replaceAll("(?<!\\\\)(\\\\\\\\)+(?!\\\\)", "*");
// minimize unescaped redundant wildcards
regex = regex.replaceAll("(?<!\\\\)[?]*[*][*?]+", "*");
// escape unescaped regexps special chars, but [\], [?] and [*]
regex = regex.replaceAll("(?<!\\\\)([|\\[\\]{}(),.^$+-])", "\\\\$1");
// replace unescaped [?] by [.]
regex = regex.replaceAll("(?<!\\\\)[?]", ".");
// replace unescaped [*] by [.*]
regex = regex.replaceAll("(?<!\\\\)[*]", ".*");
// return whether data matches regex or not
return data.matches(regex);
}
}
Many thanks to @dan1111 and @m.buettner for their precious help ;)
这篇关于从通配符到正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!