正则表达式匹配字符串中多个单词的开头 [英] Regex match for beginning of multiple words in string
问题描述
在Javascript中,我希望能够匹配以某个短语开头的字符串。但是,我希望它能够匹配短语中任何单词的开头,而不仅仅是短语的开头。
In Javascript i want to be able to match strings that begin with a certain phrase. However, I want it to be able to match the start of any word in the phrase, not just the beginning of the phrase.
例如:
短语:这是最好的
需要匹配:th
结果:匹配Th和th
Result: Matches Th and th
编辑:\ b效果很好但它提出了另一个问题:
\b works great however it proposes another issue:
它也会匹配外国字符后的字符。例如,如果我的字符串是Männ,并且我搜索n,它将匹配M之后的n ...任何想法?
It will also match characters after foreign ones. For example if my string is "Männ", and i search for "n", it will match the n after Mä...Any ideas?
推荐答案
"This is the best moth".match(/\bth/gi);
或您的字符串变量
var string = "This is the best moth";
alert(string.match(/\bth/gi));
\b
在正则表达式中是单词边界所以 \ bth
只会匹配一个单词开头的 th
。
\b
in a regex is a word boundary so \bth
will only match a th
that at the beginning of a word.
gi
用于全局匹配(查找所有出现次数)和不区分大小写
gi
is for a global match (look for all occurrences) and case insensitive
(我在那里扔了蛾
作为提醒,检查它是否不匹配)
(I threw moth
in there to as a reminder to check that it is not matched)
< a href =http://jsfiddle.net/GNHX7/ =noreferrer> jsFiddle示例
修改:
因此,上述内容仅返回您匹配的部分(个
)。如果要返回整个单词,则必须匹配整个单词。
So, the above only returns the part that you match (th
). If you want to return the entire words, you have to match the entire word.
这是事情变得棘手的地方。首先没有HTML实体字母:
This is where things get tricky fast. First with no HTML entity letter:
string.match(/\bth[^\b]*?\b/gi);
要匹配整个单词,请从单词边界 \b
获取 th
后跟非单词边界 [^ \b]
,直到找到另一个单词边界 \ b
。 *
表示您要查找0个或更多的前一个(非单词边界)?
标记表示这是一场懒惰的比赛。换句话说,它不会扩展到尽可能大的数量,但会在第一时间停止。
To match the entire word go from the word boundary \b
grab the th
followed by non word boundaries [^\b]
until you get to another word boundary \b
. The *
means you want to look for 0 or more of the previous (non word boundaries) the ?
mark means that this is a lazy match. In other words it doesn't expand to as big as would be possible, but stops at the first opportunity.
如果你有像ä这样的HTML实体字符; (& auml;
)事情变得非常复杂,你必须使用空格或空格以及一组可能在字边界处定义的字符。
If you have HTML entity characters like ä (ä
) things get complicated really fast, and you have to use whitespace or whitespace and a set of defined characters that may be at word boundaries.
string.match(/\sth[^\s]*|^th[^\s]*/gi);
由于我们没有使用字边界,因此我们必须关注单独的字符串( | ^
)。
Since we're not using word boundaries, we have to take care of the beginning of the string separately (|^
).
上面将捕获单词开头的空格。使用 \b
将不会捕获空格,因为 \ b
没有宽度。
The above will capture the white space at the beginning of words. Using \b
will not capture white space, since \b
has no width.
这篇关于正则表达式匹配字符串中多个单词的开头的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!