净正则表达式:什么是单词字符\ W? [英] .Net regex: what is the word character \w?
本文介绍了净正则表达式:什么是单词字符\ W?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
简单问题:
什么是模式的单词字符 \ W
在C#,. NET?
Simple question:
What is the pattern for the word character \w
in c#, .net?
我首先想到的是它匹配 [A-ZA-Z0-9 _]
和文档告诉我:
My first thought was that it matches [A-Za-z0-9_]
and the documentation tells me:
Character class Description Pattern Matches
\w Matches any \w "I", "D", "A", "1", "3"
word character. in "ID A1.3"
这是不是非常有帮助。
而 \ W
似乎符合 AOU
,太。还有什么?有没有更好的(精确)的定义可用?
which is not very helpful.
And \w
seems to match äöü
, too. What else? Is there a better (exact) definition available?
推荐答案
\ W
匹配任何单词字符。单词字符是任何的下表中列出的统一code类会员。
Word Character: \w
\w
matches any word character. A word character is a member of any of the Unicode categories listed in the following table.
-
LL
(字母,小写) -
鲁
(字母,大写) -
中尉
(字母,首字母大写) -
罗
(字母,其它) -
Lm的
(字母,修饰符) -
的Nd
(数字,十进制数) -
PC
(标点符号,连接器)- 在此类别包括十个字符,最常用的,其中是LOWLINE字符(_),U + 005F。
Ll
(Letter, Lowercase)Lu
(Letter, Uppercase)Lt
(Letter, Titlecase)Lo
(Letter, Other)Lm
(Letter, Modifier)Nd
(Number, Decimal Digit)Pc
(Punctuation, Connector)- This category includes ten characters, the most commonly used of which is the LOWLINE character (_), u+005F.
如果指定符合ECMAScript的行为,
\ W
等同于[A-ZA-Z_0-9]
。If ECMAScript-compliant behavior is specified,
\w
is equivalent to[a-zA-Z_0-9]
.See also
- Unicode Character Database
- Unicode Characters in the 'Punctuation, Connector' Category
这篇关于净正则表达式:什么是单词字符\ W?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文