用于验证姓名和姓氏的正则表达式? [英] Regular expression for validating names and surnames?
问题描述
虽然这似乎是一个微不足道的问题,但我很确定它不是:)
Although this seems like a trivial question, I am quite sure it is not :)
我需要验证来自世界各地的人的姓名.想象一个巨大的名字和姓氏列表,我需要尽可能地删除我识别的任何残留物.我怎样才能用正则表达式做到这一点?如果只是英文,我认为这会削减它:
I need to validate names and surnames of people from all over the world. Imagine a huge list of miilions of names and surnames where I need to remove as well as possible any cruft I identify. How can I do that with a regular expression? If it were only English ones I think that this would cut it:
^[a-z -']+$
但是,我还需要支持这些情况:
However, I need to support also these cases:
- 其他标点符号,因为它们可能在不同的国家/地区使用(不知道是什么,但也许您知道!)
- 不同的 Unicode 字母集(重音字母、希腊语、日语、中文等)
- 没有数字或符号或不必要的标点符号或符文等.
- 标题、中间名首字母、后缀不是此数据的一部分
- 名字已经由姓氏分隔.
- 我们准备强制简化超稀有名称(存在一个名为@"的人,但在任何地方都允许使用该字符是没有意义的.使用务实和明智.)
- 请注意,许多国家/地区都有关于姓名的法律,因此需要遵循一些标准
是否有一种标准方法可以验证我可以实施的这些字段,以确保我们的网站用户拥有出色的体验,并且在列表中注册时实际上可以使用他们的姓名?
Is there a standard way of validating these fields I can implement to make sure that our website users have a great experience and can actually use their name when registering in the list?
我会寻找类似于您可以在谷歌上找到的许多电子邮件地址"正则表达式的东西.
I would be looking for something similar to the many "email address" regexes that you can find on google.
推荐答案
我自己会尽量给出正确的答案:
I'll try to give a proper answer myself:
名称中唯一允许使用的标点符号是句号、撇号和连字符.我在边角案例列表中没有看到任何其他案例.
The only punctuations that should be allowed in a name are full stop, apostrophe and hyphen. I haven't seen any other case in the list of corner cases.
关于数字,只有一种情况是 8.我想我可以安全地禁止这种情况.
Regarding numbers, there's only one case with an 8. I think I can safely disallow that.
关于字母,任何字母都是有效的.
Regarding letters, any letter is valid.
我也想包含空格.
总结起来就是这个正则表达式:
This would sum up to this regex:
^[p{L} .'-]+$
这会带来一个问题,即撇号可以用作攻击向量.它应该被编码.
This presents one problem, i.e. the apostrophe can be used as an attack vector. It should be encoded.
所以验证码应该是这样的(未经测试):
So the validation code should be something like this (untested):
var name = nameParam.Trim();
if (!Regex.IsMatch(name, "^[p{L} .-]+$"))
throw new ArgumentException("nameParam");
name = name.Replace("'", "'"); //' does not work in IE
谁能想出一个名字不应该通过这个测试或可以通过的 XSS 或 SQL 注入的原因?
Can anyone think of a reason why a name should not pass this test or a XSS or SQL Injection that could pass?
完整的测试解决方案
using System;
using System.Text.RegularExpressions;
namespace test
{
class MainClass
{
public static void Main(string[] args)
{
var names = new string[]{"Hello World",
"John",
"João",
"タロウ",
"やまだ",
"山田",
"先生",
"мыхаыл",
"Θεοκλεια",
"आकाङ्क्षा",
"علاء الدين",
"אַבְרָהָם",
"മലയാളം",
"상",
"D'Addario",
"John-Doe",
"P.A.M.",
"' --",
"<xss>",
"""
};
foreach (var nameParam in names)
{
Console.Write(nameParam+" ");
var name = nameParam.Trim();
if (!Regex.IsMatch(name, @"^[p{L}p{M}' .-]+$"))
{
Console.WriteLine("fail");
continue;
}
name = name.Replace("'", "'");
Console.WriteLine(name);
}
}
}
}
这篇关于用于验证姓名和姓氏的正则表达式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!