在字符串中搜索单词 [英] Search for a word in a String
问题描述
如果我在字符串中查找特定单词,例如,在字符串你好吗我正在寻找是。
常规indexOf()工作得更快更好还是正则表达式匹配()
字符串testStr =怎么样您;
String lookUp =are;
// METHOD1
if(testStr.indexOf(lookUp)!= -1)
{
System.out.println(Found!);
}
// OR
//方法2
if(testStr.match(。*+ lookUp +。*))
{
System.out.println(Found!);
}
上述两种方法中的哪一种是查找内部字符串的更好方法另一串?或者有更好的选择吗?
- Ivard
如果你不关心它是否真的是你匹配的整个单词,那么 indexOf()
将是更快。
另一方面,如果你需要能够区分
, harebrained
,不是
等等,那么你需要一个正则表达式: \ bare \ b
只会将
作为整个单词匹配( \\bare\\b
b
$ b
\ b
是一个单词边界锚点,它与空白空间匹配在字母数字字符(字母,数字或下划线)和非字母数字字符之间。
警告:这也意味着,如果您的搜索字词实际上不是一个字(假设您正在寻找 ###
),然后这些单词边界锚点只匹配 aaa ### zzz
之类的字符串,但不会出现在 +++中### +++
。
进一步警告:默认情况下,Java对于构成字母数字字符的内容有一个有限的世界观。此处只有ASCII字母/数字(加上下划线)计数,因此单词边界锚点会在élève
,relevé
或ärgern
。 了解更多相关信息(以及如何解决此问题)这里。
If I am looking for a particular word inside a string, for example, in the string "how are you" I am looking for "are". Would a regular indexOf() work faster and better or a Regex match()
String testStr = "how are you";
String lookUp = "are";
//METHOD1
if (testStr.indexOf(lookUp) != -1)
{
System.out.println("Found!");
}
//OR
//METHOD 2
if (testStr.match(".*"+lookUp+".*"))
{
System.out.println("Found!");
}
Which of the two methods above is a better way of looking for a string inside another string? Or is there a much better alternative?
- Ivard
If you don't care whether it's actually the entire word you're matching, then indexOf()
will be a lot faster.
If, on the other hand, you need to be able to differentiate between are
, harebrained
, aren't
etc., then you need a regex: \bare\b
will only match are
as an entire word (\\bare\\b
in Java).
\b
is a word boundary anchor, and it matches the empty space between an alphanumeric character (letter, digit, or underscore) and a non-alphanumeric character.
Caveat: This also means that if your search term isn't actually a word (let's say you're looking for ###
), then these word boundary anchors will only match in a string like aaa###zzz
, but not in +++###+++
.
Further caveat: Java has by default a limited worldview on what constitutes an alphanumeric character. Only ASCII letters/digits (plus the underscore) count here, so word boundary anchors will fail on words like élève
, relevé
or ärgern
. Read more about this (and how to solve this problem) here.
这篇关于在字符串中搜索单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!