在String上运行多个正则表达式模式 [英] Running multiple regex patterns on String

查看:104
本文介绍了在String上运行多个正则表达式模式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个 List< String> 和一个空的 List< Pattern> ,这是最好的方法吗处理将String中的单词变成Pattern对象;

Assuming I have a List<String> and an empty List<Pattern>, is this the best way to handle making the words in the String into Pattern objects;

for(String word : stringList) {
    patterns.add(Pattern.compile("\\b(" + word + ")\\b);
}

然后稍后在字符串上运行;

And then to run this on a string later;

for(Pattern pattern : patterns) {
    Matcher matcher = pattern.matcher(myString);
    if(matcher.matches()) {
         myString = matcher.replaceAll("String[$1]");
    }
}

replaceAll位只是一个例子,但最多使用$ 1当我使用它的时候。

The replaceAll bit is just an example, but $1 would be used most of the the time when I use this.

有没有更有效的方法?因为我觉得这有点笨重。我在列表中使用了80个字符串顺便说一句,尽管使用的字符串是可配置的,所以不会总是那么多。

Is there a more efficient way? Because I feel like this is somewhat clunky. I'm using 80 Strings in the list by the way, though the Strings used are configurable, so there won't always be so many.

这被设计为有点过时的过滤器,所以我会让你假设列表中的单词,

This is designed to be somewhat of a swearing filter so I'll let you assume the words in the List,

输入的一个例子是 你是一个< curse>,输出将是你是一个*****这个词,虽然情况可能并非总是如此,但在某些时候我可能正在读取 HashMap< String,String> 其中键是捕获组,值是替换。

An example of input would be "You're a <curse>", the output would be "You're a *****" for this word, though this may not always be the case and at some point I may be reading from a HashMap<String, String>where the key is the capture group and the value is the replacement.

示例:

if(hashMap.get(matcher.group(1)) == null) { 
    // Can't test if \ is required. Used it here for safe measure.
    matcher.replaceAll("\*\*\*\*");
 } else {
    matcher.replaceAll(hashMap.get(matcher.group(1));
 }


推荐答案

您可以使用 | 交替加入这些模式:

You can join these patterns together using alternation with |:

Pattern pattern = Pattern.compile("\\b(" + String.join("|",stringList) + ")\\b");

如果你不能使用Java 8,那么就不要 String.join 方法,或者如果需要转义单词以防止其中的字符被解释为正则表达式元字符,您需要构建这个带有手动循环的正则表达式:

If you cannot use Java 8 so do not have the String.join method, or if you need to escape the words to prevent characters in them from being interpreted as regex metacharacters, you will need to build this regex with a manual loop:

StringBuilder regex = new StringBuilder("\\b(");
for (String word : stringList) {
    regex.append(Pattern.quote(word));
    regex.append("|");
}
regex.setLength(regex.length() - 1); // delete last added "|"
regex.append(")\\b");
Pattern pattern = Pattern.compile(regex.toString());






要对不同的单词使用不同的替换,可以使用此循环应用模式:


To use different replacements for the different words, you can apply the pattern with this loop:

Matcher m = pattern.matcher(myString);
StringBuilder out = new StringBuilder();
int pos = 0;
while (m.find()) {
    out.append(myString, pos, m.start());
    String matchedWord = m.group(1);
    String replacement = matchedWord.replaceAll(".", "*");
    out.append(replacement);
    pos = m.end();
}
out.append(myString, pos, myString.length());
myString = out.toString();

您可以按照自己喜欢的方式查找匹配单词的替换。该示例生成与匹配单词长度相同的替换字符串。

You can look up the replacement for the matched word any way you like. The example generates a replacement string of asterisks of the same length as the matched word.

这篇关于在String上运行多个正则表达式模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆