正则表达式:\ w-"_" +“-"在UTF-8中 [英] RegEx: \w - "_" + "-" in UTF-8

查看：103 发布时间：2020/5/27 2:34:08 php regex unicode utf-8 pcre

本文介绍了正则表达式:\ w-"_" +“-"在UTF-8中的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要一个匹配UTF-8字母和数字，破折号(-)但不匹配下划线(_)的正则表达式，但我尝试了这些愚蠢的尝试，但未成功:

I need a regular expression that matches UTF-8 letters and digits, the dash sign (-) but doesn't match underscores (_), I tried these silly attempts without success:

([\w-^_])+
([\w^_]-?)+
(\w[^_]-?)+

([\w-^_])+
([\w^_]-?)+
(\w[^_]-?)+

\w是[A-Za-z0-9_]的简写，但如果设置了u修饰符，它也可以匹配UTF-8字符.

The \w is shorthand for [A-Za-z0-9_], but it also matches UTF-8 chars if I have the u modifier set.

有人可以帮我吗?

推荐答案

尝试一下:

(?:[\w\-](?<!_))+

它对编码为\ w(或破折号)的任何东西进行简单匹配，然后在后面留有零宽度，以确保刚匹配的字符不是下划线.

It does a simple match on anything that is encoded as a \w (or a dash) and then has a zero-width lookbehind that ensures that the character that was just matched is not a underscore.

否则，您可以选择这个:

Otherwise you could pick this one:

(?:[^_\W]|-)+

这是一种基于集合的方法(请注意大写的W)

which is a more set-based approach (note the uppercase W)

好的，我对php的PCRE风格的unicode有很多乐趣:D Peekaboo说有一个简单的解决方案:

OK, I had a lot of fun with unicode in php's flavor of PCREs :D Peekaboo says there is a simple solution available:

[\p{L}\p{N}\-]+

\ p {L}匹配任何符合字母的unicode(注意:不是单词字符，因此没有下划线)，而\ p {N}匹配任何看起来像数字的东西(包括罗马数字和更多奇特的东西) ).
\-只是一个逃脱的破折号.尽管不是绝对必要，但我倾向于将字符类中的短划线转义为重点...请注意，unicode中有数十种不同的短划线，因此产生了以下版本:

\p{L} matches anything unicode that qualifies as a Letter (note: not a word character, thus no underscores), while \p{N} matches anything that looks like a number (including roman numerals and more exotic things).
\- is just an escaped dash. Although not strictly necessary, I tend to make it a point to escape dashes in character classes... Note, that there are dozens of different dashes in unicode, thus giving rise to the following version:

[\p{L}\p{N}\p{Pd}]+

其中"Pd"是标点符号，包括但不限于我们的减号-小东西. (请注意，此处再次没有下划线).

Where "Pd" is Punctuation Dash, including, but not limited to our minus-dash-thingy. (Note, again no underscore here).

这篇关于正则表达式:\ w-"_" +“-"在UTF-8中的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

正则表达式:\ w-"_" +“-"在UTF-8中 [英] RegEx: \w - "_" + "-" in UTF-8

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

正则表达式:\ w-"_" +“-"在UTF-8中 [英] RegEx: \w - &quot;_&quot; + &quot;-&quot; in UTF-8

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

正则表达式:\ w-"_" +“-"在UTF-8中 [英] RegEx: \w - "_" + "-" in UTF-8

登录关闭