带有特殊字符的名称的正则表达式(Unicode) [英] Regex for names with special characters (Unicode)

查看:131
本文介绍了带有特殊字符的名称的正则表达式(Unicode)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好的,我整天都在阅读正则表达式,但仍然不太了解它.我想做的是验证名称,但是我在互联网上可以找到的功能仅使用[a-zA-Z],而忽略了我需要接受的字符.

Okay, I have read about regex all day now, and still don't understand it properly. What i'm trying to do is validate a name, but the functions i can find for this on the internet only use [a-zA-Z], leaving characters out that i need to accept to.

我基本上需要一个正则表达式来检查名称至少是两个单词,并且不包含数字或特殊字符(例如!"#¤%&/()=...),但是这些单词可以包含æ,é,Â等字符. ..

I basically need a regex that checks that the name is at least two words, and that it does not contain numbers or special characters like !"#¤%&/()=..., however the words can contain characters like æ, é, Â and so on...

可接受的名称示例为:"JohnElkjærd"或AndréSvenson"
不可接受的名称为:" Hans ","H 4 nn 3 Andersen"或"Martin Henriksen "

An example of an accepted name would be: "John Elkjærd" or "André Svenson"
An non-accepted name would be: "Hans", "H4nn3 Andersen" or "Martin Henriksen!"

如果很重要,我使用javascript .match()函数客户端,并且只想在负面"服务器端使用php的preg_replace(). (删除不匹配的字符).

If it matters i use the javascript .match() function client side and want to use php's preg_replace() only "in negative" server side. (removing non-matching characters).

任何帮助将不胜感激.

更新:
好的,感谢 Alix Axel的回答我扮演了重要的角色下来,服务器端.

Update:
Okay, thanks to Alix Axel's answer i have the important part down, the server side one.

但是,正如 LightWing的答案中的页面所示,我找不到关于javascript的unicode支持的任何信息,所以我最终为客户端提供了一半的解决方案,只是检查了至少两个单词和最少5个字符,如下所示:

But as the page from LightWing's answer suggests, i'm unable to find anything about unicode support for javascript, so i ended up with half a solution for the client side, just checking for at least two words and minimum 5 characters like this:

if(name.match(/\S+/g).length >= minWords && name.length >= 5) {
  //valid
}

一种替代方法是按照中的建议指定所有Unicode字符. shifty的答案,与上面的解决方案一样,我最终可能会做类似的事情,但这有点不切实际.

An alternative would be to specify all the unicode characters as suggested in shifty's answer, which i might end up doing something like, along with the solution above, but it is a bit unpractical though.

推荐答案

尝试以下正则表达式:

^(?:[\p{L}\p{Mn}\p{Pd}\'\x{2019}]+\s[\p{L}\p{Mn}\p{Pd}\'\x{2019}]+\s?)+$

在PHP中,这表示为:

In PHP this translates to:

if (preg_match('~^(?:[\p{L}\p{Mn}\p{Pd}\'\x{2019}]+\s[\p{L}\p{Mn}\p{Pd}\'\x{2019}]+\s?)+$~u', $name) > 0)
{
    // valid
}

您应该这样阅读:

^   # start of subject
    (?:     # match this:
        [           # match a:
            \p{L}       # Unicode letter, or
            \p{Mn}      # Unicode accents, or
            \p{Pd}      # Unicode hyphens, or
            \'          # single quote, or
            \x{2019}    # single quote (alternative)
        ]+              # one or more times
        \s          # any kind of space
        [               #match a:
            \p{L}       # Unicode letter, or
            \p{Mn}      # Unicode accents, or
            \p{Pd}      # Unicode hyphens, or
            \'          # single quote, or
            \x{2019}    # single quote (alternative)
        ]+              # one or more times
        \s?         # any kind of space (0 or more times)
    )+      # one or more times
$   # end of subject

老实说,我不知道如何将其移植到Javascript,我什至不确定Javascript是否支持Unicode属性,但是在PHP PCRE中,这似乎完美地工作@ IDEOne.com :

I honestly don't know how to port this to Javascript, I'm not even sure Javascript supports Unicode properties but in PHP PCRE this seems to work flawlessly @ IDEOne.com:

$names = array
(
    'Alix',
    'André Svenson',
    'H4nn3 Andersen',
    'Hans',
    'John Elkjærd',
    'Kristoffer la Cour',
    'Marco d\'Almeida',
    'Martin Henriksen!',
);

foreach ($names as $name)
{
    echo sprintf('%s is %s' . "\n", $name, (preg_match('~^(?:[\p{L}\p{Mn}\p{Pd}\'\x{2019}]+\s[\p{L}\p{Mn}\p{Pd}\'\x{2019}]+\s?)+$~u', $name) > 0) ? 'valid' : 'invalid');
}

很抱歉,我无法在Javascript部分方面为您提供帮助,但可能有人会在这里提供帮助.

I'm sorry I can't help you regarding the Javascript part but probably someone here will.

验证:

  • JohnElkjærd
  • 安德烈·斯文森
  • 马可·达美(Marco d'Almeida)
  • 克里斯托弗·拉库尔

无效:

  • 汉斯
  • H4nn3安徒生
  • 马丁·亨里克森!

要替换无效字符,尽管我不确定为什么需要这样做,您只需要对其稍作更改即可:

To replace invalid characters, though I'm not sure why you need this, you just need to change it slightly:

$name = preg_replace('~[^\p{L}\p{Mn}\p{Pd}\'\x{2019}\s]~u', '$1', $name);

示例:

  • H4nn3 Andersen -> Hnn Andersen
  • 马丁·亨里克森(Martin Henriksen)! -> 马丁·亨里克森
  • H4nn3 Andersen -> Hnn Andersen
  • Martin Henriksen! -> Martin Henriksen

请注意,您始终需要使用 u 修饰符.

Note that you always need to use the u modifier.

这篇关于带有特殊字符的名称的正则表达式(Unicode)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆