java中的正则表达式,用于查找重复的连续单词 [英] Regex in java for finding duplicate consecutive words
本文介绍了java中的正则表达式,用于查找重复的连续单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我认为这是找到字符串中重复单词的答案。但是当我使用它时,它认为这个
和是
是相同的并删除是
。
I saw this as an answer for finding repeated words in a string. But when I use it, it thinks This
and is
are the same and deletes the is
.
正则表达式
"\\b(\\w+)\\b\\s+\\1"
知道为什么会这样吗?
以下是我用于重复删除的代码
Here is the code that I am using for duplicate removal
public static String RemoveDuplicateWords(String input)
{
String originalText = input;
String output = "";
Pattern p = Pattern.compile("\b(\w+)\b\s+\b\1\b", Pattern.MULTILINE+Pattern.CASE_INSENSITIVE);
//Pattern p = Pattern.compile("\\b(\\w+)\\b\\s+\\1", Pattern.MULTILINE+Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(input);
if (!m.find())
output = "No duplicates found, no changes made to data";
else
{
while (m.find())
{
if (output == "")
output = input.replaceFirst(m.group(), m.group(1));
else
output = output.replaceAll(m.group(), m.group(1));
}
input = output;
m = p.matcher(input);
while (m.find())
{
output = "";
if (output == "")
output = input.replaceAll(m.group(), m.group(1));
else
output = output.replaceAll(m.group(), m.group(1));
}
}
return output;
}
推荐答案
试试这个:
String pattern = "(?i)\\b([a-z]+)\\b(?:\\s+\\1\\b)+";
Pattern r = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
String input = "your string";
Matcher m = r.matcher(input);
while (m.find()) {
input = input.replaceAll(m.group(), m.group(1));
}
System.out.println(input);
在 Pattern类的API文档。添加一些空格以指示正则表达式的不同部分后:
The Java regular expressions are explained very well in the API documentation of the Pattern class. After adding some spaces to indicate the different parts of the regular expression:
"(?i) \\b ([a-z]+) \\b (?: \\s+ \\1 \\b )+"
\b match a word boundary
[a-z]+ match a word with one or more characters;
the parentheses capture the word as a group
\b match a word boundary
(?: indicates a non-capturing group (which starts here)
\s+ match one or more white space characters
\1 is a back reference to the first (captured) group;
so the word is repeated here
\b match a word boundary
)+ indicates the end of the non-capturing group and
allows it to occur one or more times
这篇关于java中的正则表达式,用于查找重复的连续单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文