在正则表达式中匹配带有井号 (#) 符号的单词 [英] Matching a word with pound (#) symbol in a regex
问题描述
我有正则表达式来检查某些文本是否包含单词(忽略边界)String regexp = ".*\\bSOME_WORD_HERE\\b.*";
但是当SOME_WORD"以#(hashtag)开头时,这个正则表达式返回false
.
I have regexp for check if some text containing word (with ignoring boundary)
String regexp = ".*\\bSOME_WORD_HERE\\b.*";
but this regexp return false
when "SOME_WORD" starts with # (hashtag).
Example, without #
String text = "some text and test word";
String matchingWord = "test";
boolean contains = text.matches(".*\\b" + matchingWord + "\\b.*");
// now contains == true;
But with hashtag `contains` was false. Example:
text = "some text and #test word";
matchingWord = "#test";
contains = text.matches(".*\\b" + matchingWord + "\\b.*");
//contains == fasle; but I expect true
推荐答案
\b#
模式匹配以单词字符开头的 #
:字母、数字或下划线.
The \b#
pattern matches a #
that is preceded with a word character: a letter, digit or underscore.
如果您需要匹配前面没有字符字符的 #
,请在 (?<!\w)
后面使用否定的lookbehind.类似地,要确保尾随的 \b
匹配是否存在非单词字符,请使用 (?!\w)
否定前瞻:
If you need to match #
that is not preceded with a word char, use a negative lookbehind (?<!\w)
. Similarly, to make sure the trailing \b
matches if a non-word char is there, use (?!\w)
negative lookahead:
text.matches("(?s).*(?<!\\w)" + matchingWord + "(?!\\w).*");
如果您的 matchingWord
可以包含特殊的正则表达式元字符,则使用 Pattern.quote(matchingWord)
是个好主意.
Using Pattern.quote(matchingWord)
is a good idea if your matchingWord
can contain special regex metacharacters.
或者,如果您打算在空格或字符串开头/结尾之间匹配搜索词,则可以使用 (?<!\S)
作为初始边界和 (?!\S)
作为尾随
Alternatively, if you plan to match your search words in between whitespace or start/end of string, you can use (?<!\S)
as the initial boundary and (?!\S)
as the trailing one
text.matches("(?s).*(?<!\\S)" + matchingWord + "(?!\\S).*");
还有一件事:.matches
中的 .*
不是最好的正则表达式解决方案.像 "(?<!\\S)" + matchingWord + "(?!\\S)"
和 Matcher#find()
这样的正则表达式将在一种更优化的方式,但您需要为此初始化 Matcher
对象.
And one more thing: the .*
in the .matches
is not the best regex solution. A regex like "(?<!\\S)" + matchingWord + "(?!\\S)"
with Matcher#find()
will be processed in a much more optimized way, but you will need to initialize the Matcher
object for that.
这篇关于在正则表达式中匹配带有井号 (#) 符号的单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!