在正则表达式中匹配带有井号 (#) 符号的单词 [英] Matching a word with pound (#) symbol in a regex

查看:225
本文介绍了在正则表达式中匹配带有井号 (#) 符号的单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有正则表达式来检查某些文本是否包含单词(忽略边界)
String regexp = ".*\\bSOME_WORD_HERE\\b.*";但是当SOME_WORD"以#(hashtag)开头时,这个正则表达式返回false.

I have regexp for check if some text containing word (with ignoring boundary)
String regexp = ".*\\bSOME_WORD_HERE\\b.*"; but this regexp return false when "SOME_WORD" starts with # (hashtag).

Example, without #   
String text = "some text and test word";
String matchingWord = "test";
boolean contains = text.matches(".*\\b" + matchingWord + "\\b.*");
// now contains == true; 

But with hashtag `contains` was false. Example:
text = "some text and #test word";
matchingWord = "#test"; 
contains = text.matches(".*\\b" + matchingWord + "\\b.*");
//contains == fasle; but I expect true    

推荐答案

\b# 模式匹配以单词字符开头的 #:字母、数字或下划线.

The \b# pattern matches a # that is preceded with a word character: a letter, digit or underscore.

如果您需要匹配前面没有字符字符的 #,请在 (?<!\w) 后面使用否定的lookbehind.类似地,要确保尾随的 \b 匹配是否存在非单词字符,请使用 (?!\w) 否定前瞻:

If you need to match # that is not preceded with a word char, use a negative lookbehind (?<!\w). Similarly, to make sure the trailing \b matches if a non-word char is there, use (?!\w) negative lookahead:

text.matches("(?s).*(?<!\\w)" + matchingWord + "(?!\\w).*");

如果您的 matchingWord 可以包含特殊的正则表达式元字符,则使用 Pattern.quote(matchingWord) 是个好主意.

Using Pattern.quote(matchingWord) is a good idea if your matchingWord can contain special regex metacharacters.

或者,如果您打算在空格或字符串开头/结尾之间匹配搜索词,则可以使用 (?<!\S) 作为初始边界和 (?!\S) 作为尾随

Alternatively, if you plan to match your search words in between whitespace or start/end of string, you can use (?<!\S) as the initial boundary and (?!\S) as the trailing one

text.matches("(?s).*(?<!\\S)" + matchingWord + "(?!\\S).*");

还有一件事:.matches 中的 .* 不是最好的正则表达式解决方案.像 "(?<!\\S)" + matchingWord + "(?!\\S)"Matcher#find() 这样的正则表达式将在一种更优化的方式,但您需要为此初始化 Matcher 对象.

And one more thing: the .* in the .matches is not the best regex solution. A regex like "(?<!\\S)" + matchingWord + "(?!\\S)" with Matcher#find() will be processed in a much more optimized way, but you will need to initialize the Matcher object for that.

这篇关于在正则表达式中匹配带有井号 (#) 符号的单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆