用于验证姓名和姓氏的正则表达式? [英] Regular expression for validating names and surnames?

查看:26
本文介绍了用于验证姓名和姓氏的正则表达式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

虽然这似乎是一个微不足道的问题,但我很确定它不是:)

Although this seems like a trivial question, I am quite sure it is not :)

我需要验证来自世界各地的人的姓名.想象一个巨大的名字和姓氏列表,我需要尽可能地删除我识别的任何残留物.我怎样才能用正则表达式做到这一点?如果只是英文,我认为这会削减它:

I need to validate names and surnames of people from all over the world. Imagine a huge list of miilions of names and surnames where I need to remove as well as possible any cruft I identify. How can I do that with a regular expression? If it were only English ones I think that this would cut it:

^[a-z -']+$

但是,我还需要支持这些情况:

However, I need to support also these cases:

  • 其他标点符号,因为它们可能在不同的国家/地区使用(不知道是什么,但也许您知道!)
  • 不同的 Unicode 字母集(重音字母、希腊语、日语、中文等)
  • 没有数字或符号或不必要的标点符号或符文等.
  • 标题、中间名首字母、后缀不是此数据的一部分
  • 名字已经由姓氏分隔.
  • 我们准备强制简化超稀有名称(存在一个名为@"的人,但在任何地方都允许使用该字符是没有意义的.使用务实和明智.)
  • 请注意,许多国家/地区都有关于姓名的法律,因此需要遵循一些标准

是否有一种标准方法可以验证我可以实施的这些字段,以确保我们的网站用户拥有出色的体验,并且在列表中注册时实际上可以使用他们的姓名?

Is there a standard way of validating these fields I can implement to make sure that our website users have a great experience and can actually use their name when registering in the list?

我会寻找类似于您可以在谷歌上找到的许多电子邮件地址"正则表达式的东西.

I would be looking for something similar to the many "email address" regexes that you can find on google.

推荐答案

我自己会尽量给出正确的答案:

I'll try to give a proper answer myself:

名称中唯一允许使用的标点符号是句号、撇号和连字符.我在边角案例列表中没有看到任何其他案例.

The only punctuations that should be allowed in a name are full stop, apostrophe and hyphen. I haven't seen any other case in the list of corner cases.

关于数字,只有一种情况是 8.我想我可以安全地禁止这种情况.

Regarding numbers, there's only one case with an 8. I think I can safely disallow that.

关于字母,任何字母都是有效的.

Regarding letters, any letter is valid.

我也想包含空格.

总结起来就是这个正则表达式:

This would sum up to this regex:

^[p{L} .'-]+$

这会带来一个问题,即撇号可以用作攻击向量.它应该被编码.

This presents one problem, i.e. the apostrophe can be used as an attack vector. It should be encoded.

所以验证码应该是这样的(未经测试):

So the validation code should be something like this (untested):

var name = nameParam.Trim();
if (!Regex.IsMatch(name, "^[p{L} .-]+$")) 
    throw new ArgumentException("nameParam");
name = name.Replace("'", "'");  //' does not work in IE

谁能想出一个名字不应该通过这个测试或可以通过的 XSS 或 SQL 注入的原因?

Can anyone think of a reason why a name should not pass this test or a XSS or SQL Injection that could pass?

完整的测试解决方案

using System;
using System.Text.RegularExpressions;

namespace test
{
    class MainClass
    {
        public static void Main(string[] args)
        {
            var names = new string[]{"Hello World", 
                "John",
                "João",
                "タロウ",
                "やまだ",
                "山田",
                "先生",
                "мыхаыл",
                "Θεοκλεια",
                "आकाङ्क्षा",
                "علاء الدين",
                "אַבְרָהָם",
                "മലയാളം",
                "상",
                "D'Addario",
                "John-Doe",
                "P.A.M.",
                "' --",
                "<xss>",
                """
            };
            foreach (var nameParam in names)
            {
                Console.Write(nameParam+" ");
                var name = nameParam.Trim();
                if (!Regex.IsMatch(name, @"^[p{L}p{M}' .-]+$"))
                {
                    Console.WriteLine("fail");
                    continue;
                }
                name = name.Replace("'", "&#39;");
                Console.WriteLine(name);
            }
        }
    }
}

这篇关于用于验证姓名和姓氏的正则表达式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆