正则表达式中的括号内的字边界不起作用 [英] Word boundary does not work inside brackets in regex
问题描述
我注意到在 PHP 中执行 preg_replace()
时,单词边界 \bword\b
在括号内不起作用.
I have noticed that the word boundary \bword\b
does not work inside brackets when doing a preg_replace()
in PHP.
具体来说,我试图排除完整的词 >
(在 HTML 中代表 >
),但由于词边界不会触发在 [^\b>\b]
中的括号内,任何这些字符本身,如 g
或 &
,将被检测为不匹配.如果您尝试在括号外进行匹配,\b
在 PHP 中按预期工作,即使单词以 &
开头,但不是字符.
Specifically, I'm trying to exclude the full word >
(which stands for >
in HTML), but since the word boundary does not trigger inside brackets as in [^\b>\b]
, any of those characters by itself, like g
or &
, will be detected as a non-match. If you try to do a match outside the brackets, \b
works as expected in PHP even though the word starts with a &
a non-character.
有任何想法/想法可以解决这种情况吗?
Any thoughts/ideas to get around this situation?
推荐答案
在PHP中排除,(*SKIP)(*F)是你的朋友
在 PHP 中,由于强大的 (*SKIP)(*F)
语法(在 Perl 中也可用),排除任何东西都非常简单.
In PHP, excluding anything is frighteningly simple thanks to the powerful (*SKIP)(*F)
syntax (also available in Perl).
要排除 >
并观看其他内容,您可以这样做:
To exclude >
and watch something else, you can just do this:
>(*SKIP)(*F)|something_else
交替的左侧 |
匹配完整的 >
然后故意失败,之后引擎跳到字符串中的下一个位置.右边匹配something_else
,我们知道它不是>
,因为它没有被左边的表达式匹配.只需确保 something_else
不是诸如 .*
之类的通用内容,因为它可能会覆盖所有以下 >
实例.例如,在这里,\w+
将是 something_else
的完美模式,因为它与 >
The left side of the alternation |
matches complete >
then deliberately fails, after which the engine skips to the next position in the string. The right side matches something_else
, and we know that it is not >
because it was not matched by the expression on the left. Just make sure that something_else
is not something generic such as .*
as that could roll over all the following >
instances. For instance, here, \w+
would be a perfectly fine pattern for something_else
, as it does not clash with >
进一步阅读有关在正则表达式中排除模式的技术和其他技术
这篇关于正则表达式中的括号内的字边界不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!