正则表达式:\ w-"_" +“-"在UTF-8中 [英] RegEx: \w - "_" + "-" in UTF-8
问题描述
我需要一个匹配UTF-8字母和数字,破折号(-
)但不匹配下划线(_
)的正则表达式,但我尝试了这些愚蠢的尝试,但未成功:
I need a regular expression that matches UTF-8 letters and digits, the dash sign (-
) but doesn't match underscores (_
), I tried these silly attempts without success:
-
([\w-^_])+
-
([\w^_]-?)+
-
(\w[^_]-?)+
([\w-^_])+
([\w^_]-?)+
(\w[^_]-?)+
\w
是[A-Za-z0-9_]
的简写,但如果设置了u
修饰符,它也可以匹配UTF-8字符.
The \w
is shorthand for [A-Za-z0-9_]
, but it also matches UTF-8 chars if I have the u
modifier set.
有人可以帮我吗?
推荐答案
尝试一下:
(?:[\w\-](?<!_))+
它对编码为\ w(或破折号)的任何东西进行简单匹配,然后在后面留有零宽度,以确保刚匹配的字符不是下划线.
It does a simple match on anything that is encoded as a \w (or a dash) and then has a zero-width lookbehind that ensures that the character that was just matched is not a underscore.
否则,您可以选择这个:
Otherwise you could pick this one:
(?:[^_\W]|-)+
这是一种基于集合的方法(请注意大写的W)
which is a more set-based approach (note the uppercase W)
好的,我对php的PCRE风格的unicode有很多乐趣:D Peekaboo说有一个简单的解决方案:
OK, I had a lot of fun with unicode in php's flavor of PCREs :D Peekaboo says there is a simple solution available:
[\p{L}\p{N}\-]+
\ p {L}匹配任何符合字母的unicode(注意:不是单词字符,因此没有下划线),而\ p {N}匹配任何看起来像数字的东西(包括罗马数字和更多奇特的东西) ).
\-只是一个逃脱的破折号.尽管不是绝对必要,但我倾向于将字符类中的短划线转义为重点...请注意,unicode中有数十种不同的短划线,因此产生了以下版本:
\p{L} matches anything unicode that qualifies as a Letter (note: not a word character, thus no underscores), while \p{N} matches anything that looks like a number (including roman numerals and more exotic things).
\- is just an escaped dash. Although not strictly necessary, I tend to make it a point to escape dashes in character classes... Note, that there are dozens of different dashes in unicode, thus giving rise to the following version:
[\p{L}\p{N}\p{Pd}]+
其中"Pd"是标点符号,包括但不限于我们的减号-小东西. (请注意,此处再次没有下划线).
Where "Pd" is Punctuation Dash, including, but not limited to our minus-dash-thingy. (Note, again no underscore here).
这篇关于正则表达式:\ w-"_" +“-"在UTF-8中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!