正则表达式单词边界在PHP中到底如何工作? [英] How exactly do Regular Expression word boundaries work in PHP?
问题描述
我目前正在编写一个用于匹配内容中特定单词的库.
I'm currently writing a library for matching specific words in content.
基本上,其工作方式是将单词编译为正则表达式,然后通过所述正则表达式运行内容.
Essentially the way it works is by compiling words into regular expressions, and running content through said regular expressions.
我要添加的功能是指定要匹配的给定单词是否必须以单词开头和/或结尾.例如,我有cat
一词.我将其指定为必须以一个单词开头,因此catering
将匹配,因为cat
在开头,但ducat
不匹配,因为cat
不能以单词开头.
A feature I want to add is specifying whether a given word to match must start and/or end a word. For example, I have the word cat
. I specify that it must start a word, so catering
will match as cat
is at the start, but ducat
won't match as cat
doesn't start the word.
我想使用单词边界来做到这一点,但是在一些测试中,我发现它不起作用,正如我期望的那样.
I wanted to do this using word boundaries, but during some testing I found it doesn't work as I'd expect it to.
采取以下措施,
preg_match("/(^|\b)@nimal/i", "something@nimal", $match);
preg_match("/(^|\b)@nimal/i", "something!@nimal", $match);
在上面的陈述中,我希望得到以下结果,
In the statements above I would expect the following results,
> false
> 1 (@nimal)
但是结果相反,
> 1 (@nimal)
> false
首先,我希望它会失败,因为小组会吃掉@
,而使nimal
与@nimal
匹配,显然不会.而是,该组匹配一个空字符串,因此匹配了@nimal
,这意味着@
被认为是单词的一部分.
In the first, I would expect it to fail as the group will eat the @
, leaving nimal
to match against @nimal
, which obviously it doesn't. Instead, the group matchs an empty string, so @nimal
is matched, meaning @
is considered to be part of the word.
在第二个中,我希望小组吃掉!
,离开@nimal
来匹配其余的(应该).取而代之的是,它似乎将!
和@
组合在一起形成一个单词,该单词通过以下匹配得到确认,
In the second, I would expect the group to eat the !
leaving @nimal
to match the rest (which it should). Instead, it appears to combine the !
and @
together to form a word, which is confirmed by the following matching,
preg_match("/g\b!@\bn/i", "something!@nimal", $match);
有什么想法为什么要使用正则表达式吗?
Any ideas why regular expression does this?
我只是爱一个页面,该页面清楚地记录了如何确定单词边界,但我终生找不到一个页面.
I'd just love a page that clearly documents how word boundaries are determined, I just can't find one for the life of me.
推荐答案
单词边界\b
在从\w
(单词字符)到\W
非单词字符的变化上匹配.如果您要匹配@
字符之前的@
前面是否有\b
,则要进行匹配.因此,要匹配,在@
The word boundary \b
matches on a change from a \w
(a word character) to a \W
a non word character. You want to match if there is a \b
before your @
which is a \W
character. So to match you need a word character before your @
something@nimal
^^
==>由于g
和@
之间的单词边界而匹配.
==> Match because of the word boundary between g
and @
.
something!@nimal
^^
==> NO匹配,因为在!
和@
之间没有单词边界,两个字符均为\W
==> NO match because between !
and @
there is no word boundary, both characters are \W
这篇关于正则表达式单词边界在PHP中到底如何工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!