带有特殊字符的名称的正则表达式(Unicode) [英] Regex for names with special characters (Unicode)
问题描述
好的,我整天都在阅读正则表达式,但仍然不太了解它.我想做的是验证名称,但是我在互联网上可以找到的功能仅使用[a-zA-Z]
,而忽略了我需要接受的字符.
Okay, I have read about regex all day now, and still don't understand it properly. What i'm trying to do is validate a name, but the functions i can find for this on the internet only use [a-zA-Z]
, leaving characters out that i need to accept to.
我基本上需要一个正则表达式来检查名称至少是两个单词,并且不包含数字或特殊字符(例如!"#¤%&/()=...
),但是这些单词可以包含æ,é,Â等字符. ..
I basically need a regex that checks that the name is at least two words, and that it does not contain numbers or special characters like !"#¤%&/()=...
, however the words can contain characters like æ, é, Â and so on...
可接受的名称示例为:"JohnElkjærd"或AndréSvenson"
不可接受的名称为:" Hans ","H 4 nn 3 Andersen"或"Martin Henriksen !"
An example of an accepted name would be: "John Elkjærd" or "André Svenson"
An non-accepted name would be: "Hans", "H4nn3 Andersen" or "Martin Henriksen!"
如果很重要,我使用javascript .match()
函数客户端,并且只想在负面"服务器端使用php的preg_replace()
. (删除不匹配的字符).
If it matters i use the javascript .match()
function client side and want to use php's preg_replace()
only "in negative" server side. (removing non-matching characters).
任何帮助将不胜感激.
更新:
好的,感谢 Alix Axel的回答我扮演了重要的角色下来,服务器端.
Update:
Okay, thanks to Alix Axel's answer i have the important part down, the server side one.
但是,正如 LightWing的答案中的页面所示,我找不到关于javascript的unicode支持的任何信息,所以我最终为客户端提供了一半的解决方案,只是检查了至少两个单词和最少5个字符,如下所示:
But as the page from LightWing's answer suggests, i'm unable to find anything about unicode support for javascript, so i ended up with half a solution for the client side, just checking for at least two words and minimum 5 characters like this:
if(name.match(/\S+/g).length >= minWords && name.length >= 5) {
//valid
}
一种替代方法是按照中的建议指定所有Unicode字符. shifty的答案,与上面的解决方案一样,我最终可能会做类似的事情,但这有点不切实际.
An alternative would be to specify all the unicode characters as suggested in shifty's answer, which i might end up doing something like, along with the solution above, but it is a bit unpractical though.
推荐答案
尝试以下正则表达式:
^(?:[\p{L}\p{Mn}\p{Pd}\'\x{2019}]+\s[\p{L}\p{Mn}\p{Pd}\'\x{2019}]+\s?)+$
在PHP中,这表示为:
In PHP this translates to:
if (preg_match('~^(?:[\p{L}\p{Mn}\p{Pd}\'\x{2019}]+\s[\p{L}\p{Mn}\p{Pd}\'\x{2019}]+\s?)+$~u', $name) > 0)
{
// valid
}
您应该这样阅读:
^ # start of subject
(?: # match this:
[ # match a:
\p{L} # Unicode letter, or
\p{Mn} # Unicode accents, or
\p{Pd} # Unicode hyphens, or
\' # single quote, or
\x{2019} # single quote (alternative)
]+ # one or more times
\s # any kind of space
[ #match a:
\p{L} # Unicode letter, or
\p{Mn} # Unicode accents, or
\p{Pd} # Unicode hyphens, or
\' # single quote, or
\x{2019} # single quote (alternative)
]+ # one or more times
\s? # any kind of space (0 or more times)
)+ # one or more times
$ # end of subject
老实说,我不知道如何将其移植到Javascript,我什至不确定Javascript是否支持Unicode属性,但是在PHP PCRE中,这似乎完美地工作@ IDEOne.com :
I honestly don't know how to port this to Javascript, I'm not even sure Javascript supports Unicode properties but in PHP PCRE this seems to work flawlessly @ IDEOne.com:
$names = array
(
'Alix',
'André Svenson',
'H4nn3 Andersen',
'Hans',
'John Elkjærd',
'Kristoffer la Cour',
'Marco d\'Almeida',
'Martin Henriksen!',
);
foreach ($names as $name)
{
echo sprintf('%s is %s' . "\n", $name, (preg_match('~^(?:[\p{L}\p{Mn}\p{Pd}\'\x{2019}]+\s[\p{L}\p{Mn}\p{Pd}\'\x{2019}]+\s?)+$~u', $name) > 0) ? 'valid' : 'invalid');
}
很抱歉,我无法在Javascript部分方面为您提供帮助,但可能有人会在这里提供帮助.
I'm sorry I can't help you regarding the Javascript part but probably someone here will.
验证:
- JohnElkjærd
- 安德烈·斯文森
- 马可·达美(Marco d'Almeida)
- 克里斯托弗·拉库尔
无效:
- 汉斯
- H4nn3安徒生
- 马丁·亨里克森!
要替换无效字符,尽管我不确定为什么需要这样做,您只需要对其稍作更改即可:
To replace invalid characters, though I'm not sure why you need this, you just need to change it slightly:
$name = preg_replace('~[^\p{L}\p{Mn}\p{Pd}\'\x{2019}\s]~u', '$1', $name);
示例:
- H4nn3 Andersen -> Hnn Andersen
- 马丁·亨里克森(Martin Henriksen)! -> 马丁·亨里克森
- H4nn3 Andersen -> Hnn Andersen
- Martin Henriksen! -> Martin Henriksen
请注意,您始终需要使用 u 修饰符.
Note that you always need to use the u modifier.
这篇关于带有特殊字符的名称的正则表达式(Unicode)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!