java中的正则表达式，用于查找重复的连续单词 [英] Regex in java for finding duplicate consecutive words

查看：274 发布时间：2018/12/7 12:39:17 java regex

本文介绍了java中的正则表达式，用于查找重复的连续单词的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我认为这是找到字符串中重复单词的答案。但是当我使用它时，它认为这个和是是相同的并删除是。

I saw this as an answer for finding repeated words in a string. But when I use it, it thinks This and is are the same and deletes the is.

正则表达式

"\\b(\\w+)\\b\\s+\\1"

知道为什么会这样吗？

以下是我用于重复删除的代码

Here is the code that I am using for duplicate removal

public static String RemoveDuplicateWords(String input)
{
    String originalText = input;
    String output = "";
    Pattern p = Pattern.compile("\b(\w+)\b\s+\b\1\b", Pattern.MULTILINE+Pattern.CASE_INSENSITIVE); 
    //Pattern p = Pattern.compile("\\b(\\w+)\\b\\s+\\1", Pattern.MULTILINE+Pattern.CASE_INSENSITIVE);
    Matcher m = p.matcher(input);
    if (!m.find())
        output = "No duplicates found, no changes made to data";
    else
    {
        while (m.find())
        {
            if (output == "")
                output = input.replaceFirst(m.group(), m.group(1));
            else
                output = output.replaceAll(m.group(), m.group(1));
        }
        input = output;
        m = p.matcher(input);
        while (m.find())
        {
            output = "";
            if (output == "")
                output = input.replaceAll(m.group(), m.group(1));
            else
                output = output.replaceAll(m.group(), m.group(1));
        }
    }
    return output;
}

推荐答案

试试这个：

String pattern = "(?i)\\b([a-z]+)\\b(?:\\s+\\1\\b)+";
Pattern r = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);

String input = "your string";
Matcher m = r.matcher(input);
while (m.find()) {
    input = input.replaceAll(m.group(), m.group(1));
}
System.out.println(input);

在 Pattern类的API文档。添加一些空格以指示正则表达式的不同部分后：

The Java regular expressions are explained very well in the API documentation of the Pattern class. After adding some spaces to indicate the different parts of the regular expression:

"(?i) \\b ([a-z]+) \\b (?: \\s+ \\1 \\b )+"

\b       match a word boundary
[a-z]+   match a word with one or more characters;
         the parentheses capture the word as a group    
\b       match a word boundary
(?:      indicates a non-capturing group (which starts here)
\s+      match one or more white space characters
\1       is a back reference to the first (captured) group;
         so the word is repeated here
\b       match a word boundary
)+       indicates the end of the non-capturing group and
         allows it to occur one or more times

这篇关于java中的正则表达式，用于查找重复的连续单词的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

java中的正则表达式，用于查找重复的连续单词 [英] Regex in java for finding duplicate consecutive words

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

java中的正则表达式，用于查找重复的连续单词 [英] Regex in java for finding duplicate consecutive words

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭