正则表达式单词边界在PHP中到底如何工作? [英] How exactly do Regular Expression word boundaries work in PHP?

查看:102
本文介绍了正则表达式单词边界在PHP中到底如何工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在编写一个用于匹配内容中特定单词的库.

I'm currently writing a library for matching specific words in content.

基本上,其工作方式是将单词编译为正则表达式,然后通过所述正则表达式运行内容.

Essentially the way it works is by compiling words into regular expressions, and running content through said regular expressions.

我要添加的功能是指定要匹配的给定单词是否必须以单词开头和/或结尾.例如,我有cat一词.我将其指定为必须以一个单词开头,因此catering匹配,因为cat在开头,但ducat 不匹配,因为cat不能以单词开头.

A feature I want to add is specifying whether a given word to match must start and/or end a word. For example, I have the word cat. I specify that it must start a word, so catering will match as cat is at the start, but ducat won't match as cat doesn't start the word.

我想使用单词边界来做到这一点,但是在一些测试中,我发现它不起作用,正如我期望的那样.

I wanted to do this using word boundaries, but during some testing I found it doesn't work as I'd expect it to.

采取以下措施,

preg_match("/(^|\b)@nimal/i", "something@nimal", $match);
preg_match("/(^|\b)@nimal/i", "something!@nimal", $match);

在上面的陈述中,我希望得到以下结果,

In the statements above I would expect the following results,

> false
> 1 (@nimal)

但是结果相反,

> 1 (@nimal)
> false

首先,我希望它会失败,因为小组会吃掉@,而使nimal@nimal匹配,显然不会.而是,该组匹配一个空字符串,因此匹配了@nimal,这意味着@被认为是单词的一部分.

In the first, I would expect it to fail as the group will eat the @, leaving nimal to match against @nimal, which obviously it doesn't. Instead, the group matchs an empty string, so @nimal is matched, meaning @ is considered to be part of the word.

在第二个中,我希望小组吃掉!,离开@nimal来匹配其余的(应该).取而代之的是,它似乎将!@组合在一起形成一个单词,该单词通过以下匹配得到确认,

In the second, I would expect the group to eat the ! leaving @nimal to match the rest (which it should). Instead, it appears to combine the ! and @ together to form a word, which is confirmed by the following matching,

preg_match("/g\b!@\bn/i", "something!@nimal", $match);

有什么想法为什么要使用正则表达式吗?

Any ideas why regular expression does this?

我只是一个页面,该页面清楚地记录了如何确定单词边界,但我终生找不到一个页面.

I'd just love a page that clearly documents how word boundaries are determined, I just can't find one for the life of me.

推荐答案

单词边界\b在从\w(单词字符)到\W非单词字符的变化上匹配.如果您要匹配@字符之前的@前面是否有\b,则要进行匹配.因此,要匹配,在@

The word boundary \b matches on a change from a \w (a word character) to a \W a non word character. You want to match if there is a \b before your @ which is a \W character. So to match you need a word character before your @

something@nimal
        ^^

==>由于g@之间的单词边界而匹配.

==> Match because of the word boundary between g and @.

something!@nimal
         ^^ 

==> NO匹配,因为在!@之间没有单词边界,两个字符均为\W

==> NO match because between ! and @ there is no word boundary, both characters are \W

这篇关于正则表达式单词边界在PHP中到底如何工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆