为什么\ B工作,但不是\ b [英] why does \B works but not \b

查看:170
本文介绍了为什么\ B工作,但不是\ b的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

本想匹配的结尾像

您好您好# 世界#

我试图用边界

\b\w+#\b

和它不match.I认为 \ b 是一个非单词边界,但它似乎不那么从这个案例

and it doesn't match.I thought \b is a non word boundary but it doesn't seem so from this case

令人惊讶的

\b\w+#\B

匹配!

那么,为什么 \乙在这里工作,而不是 \ b !另外,为什么不 \ b 工作,在这种情况下!

So why does \B works here and not \b!Also why doesn't \b work in this case!

注意: 是的,我们可以用 \ B \ W +#(= \ s | $?),但我想知道为什么 \乙工作在这种情况下!

NOTE: Yes we can use \b\w+#(?=\s|$) but I want to know why \B works in this case!

推荐答案

定义字边界的字是IM precise。让我带的前瞻寻找隐藏定义字边界的,和速记单词字符类 \ W

Definition of word boundary \b

Defining word boundary in word is imprecise. Let me define the word boundary with look-ahead, look-behind, and short-hand word character class \w.

一个字边界 \ b 等价于:

(?:(?<!\w)(?=\w)|(?<=\w)(?!\w))

这意味着:

  • 正前方,有(至少)一个字符,一个字符,而后面,我们无法找到一个单词字符(或者字符不是单词字符,或它是字符串的开始)。

  • Right ahead, there is (at least) a character that is a word character, and right behind, we cannot find a word character (either the character is not a word character, or it is the start of the string).

(注意何其相似,这是异或的扩展到合取和析)

(Note how similar this is to the expansion of XOR into conjunction and disjunction)

一个非单词边界 \乙等价于:

A non-word boundary \B is equivalent to:

(?:(?<!\w)(?!\w)|(?<=\w)(?=\w))

这意味着:

  • 右前方,右后方,我们无法找到任何单词字符。需要注意的是空字符串是考虑根据这个定义一个非单词边界。

  • Right ahead and right behind, we cannot find any word character. Note that empty string is consider a non-word boundary under this definition.

(注意多么相似这是XNOR扩大到合取和析)。

(Note how similar this is to the expansion of XNOR into conjunction and disjunction).

由于 \ B中的定义 \乙依赖于定义\ W 1 ,您需要咨询的具体文件,以了解到底是什么 \ W 相匹配。

Since the definition of \b and \B depends on definition of \w1, you need to consult the specific documentation to know exactly what \w matches.

1 大多数的正则表达式的口味定义 \ b 根据 \ W 。好了,除了Java的 [第9点] ,其中在默认模式下, \ W 是纯ASCII和 \ b 部分是统一code感知。

1 Most of the regex flavors define \b based on \w. Well, except for Java [Point 9], where in default mode, \w is ASCII-only and \b is partially Unicode-aware.

  • In JavaScript, it would be [A-Za-z0-9_] in default mode.

.NET \ W 默认情况下,将匹配 [\ p {LL} \ p {鲁} \ p {中尉} \ p {罗} \ p {Lm的} \ p {钕} \ p {PC}] ,它会具有相同的行为,如JavaScript,如果的 ECMAScript的选项指定。在个人电脑领域字符名单,你只需要知道,空格(ASCII 32)不包括在内。

In .NET, \w by default would match [\p{Ll}\p{Lu}\p{Lt}\p{Lo}\P{Lm}\p{Nd}\p{Pc}], and it will have the same behaviour as JavaScript if ECMAScript option is specified. In the list of characters in Pc category, you only have to know that space (ASCII 32) is not included.

通过上面的定义,回答这个问题变得简单:

With the definition above, answering the question becomes easy:

"hi hello# world#"

你好#,在是空间(的 U + 0020中Zs的类别),这是不发一言的性格,和不是一个单词字符本身(的在统一code,它是在蒲类)。因此, \乙可以在这里匹配。分支(小于?!\ w)(?!\ w)的。用在这种情况下,

In hello#, after # is space (U+0020, in Zs category), which is not a word character, and # is not a word character itself (in Unicode, it is in Po category). Therefore, \B can match here. The branch (?<!\w)(?!\w) is used in this case.

世界#,在是字符串的结尾。由于是不发一言的性格,我们无法找到任何单词字符前面(有什么都没有), \乙可以匹配空字符串刚过。分支(小于?!\ w)(?!\ w)的。也被用在这种情况下,

In world#, after # is end of string. Since # is not a word character, and we cannot find any word character ahead (there is nothing there), \B can match the empty string just after #. The branch (?<!\w)(?!\w) is also used in this case.

艾伦·摩尔给出<一个相当不错的总结href="http://stackoverflow.com/questions/16623181/why-does-b-works-but-not-b/16624542?noredirect=1#comment23906040_16624542">the评论:

我认为最关键的一点要记住的是,正则表达式无法读取。也就是说,它们不处理在也就是说,只有在字符。当我们说 \ b 匹配的开头或一个单词的结束,我们不意味着它识别一个单词,然后寻找出它的终点,就像一个人的会。所有可以看到的是之前的当前位置和在的角色的当前位置的字符的。因此, \ b 只表示当前位置的可以的是一个字边界。它是由你来确保在任何一方的人物应该是什么。

I think the key point to remember is that regexes can't read. That is, they don't deal in words, only in characters. When we say \b matches the beginning or end of a word, we don't mean it identifies a word and then seeks out its endpoints, like a human would. All it can see is the character before the current position and the character after the current position. Thus, \b only indicates that the current position could be a word boundary. It's up to you to make sure the characters on either side what they should be.

这篇关于为什么\ B工作,但不是\ b的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆