如何检查一个人全名的不同拼写 [英] How check different spellings of a persons full name

查看:55
本文介绍了如何检查一个人全名的不同拼写的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试创建一个正则表达式,在一个巨大的文档中搜索一个人的全名.在正文中,姓名可以写成全称,也可以缩写为单个字母或一个字母后跟一个点或省略.例如,我对 _ALBERTO JORGE ALONSO CALEFACCION_now 的搜索是:

I try to create a regular expression with searches in a huge document for a persons full name. In the text the name can be written in full, or the first names can be either abbreviated to a single letter or a letter followed by a dot or omitted. For instance my search for _ALBERTO JORGE ALONSO CALEFACCION_now is:

preg_match('/([;:.,&\s\xc2\-(){}!"'<>]{1})(ALBERTO|A.|A)[\s\xc2-]+
(JORGE|J.|J)?[\s\xc2,]+(ALONSO)[\s\xc2*-]+(CALEFACCION))([;:.,&\s\xc2(){}
!"'<>]{1})/i', $text, $match);

名字和姓氏之间可以有一个星号 (*).

Between the first names and last names an asterisk (*) can be present.

这适用于所有名字至少以某种方式出现的情况.但是我不知道在省略名字时扩展表达式.你能帮我吗?

This is working for the case all first names are at least present some way. But I don't know to extend the expression when first names are omitted. Can you help me?

推荐答案

让我们从简化您所拥有的开始;

Let's start by simplifying what you have;

开始:

/([;:.,&\s\xc2\-(){}!"'<>]{1})(ALBERTO|A.|A)[\s\xc2-]+(JORGE|J.|J)?[\s\xc2,]+(ALONSO)[\s\xc2*-]+(CALEFACCION)([;:.,&\s\xc2(){}!"'<>]{1})/i

正如我在评论中所说,\b 是断字",因此您可以简化很多:

as I said in my comment, \b is "word break", so you can simplify a lot of that:

/\b(ALBERTO|A.|A)[\s\xc2-]+(JORGE|J.|J)?[\s\xc2,]+(ALONSO)[\s\xc2*-]+(CALEFACCION)\b/i

(额外的奖励:它现在不会匹配任何一侧的字符,它会在文本的开头和结尾匹配)

(added bonus: it won't match the characters either side now, and it will match at the start and end of the text)

接下来,您可以使用 ? 标记作为点(应该顺便转义;. 是特殊的,表示匹配任何内容")

Next, you can use the ? token for the dots (which should be escaped by the way; . is special and means "match anything")

/\b(ALBERTO|A\.?)[\s\xc2-]+(JORGE|J\.?)?[\s\xc2,]+(ALONSO)[\s\xc2*-]+(CALEFACCION)\b/i

最后,要真正回答您的问题,您有 2 个选择.要么将整个括号内的名称设为可选,要么添加新的空白选项.第一个是最灵活的,因为我们也需要处理空格:

Finally, to actually answer your question, you have 2 choices. Either make the entire bracketed name optional, or add a new blank option. The first is the most flexible, since we'll need to cope with the whitespace too:

/\b((ALBERTO|A\.?)[\s\xc2-]+((JORGE|J\.?)[\s\xc2,]+)?)?(ALONSO)[\s\xc2*-]+(CALEFACCION)\b/i

请注意,如果您正在阅读匹配的部分,则需要更新索引.另请注意,这解决了省略第二个名称 (JORGE) 仍需要额外空格的问题.

Note that if you're reading the matched parts you'll need to update your indices. Also note that this fixed an issue where omitting the second name (JORGE) still required an extra space.

这将匹配 A 之类的东西.J. ALONSO CALEFACCION, A.ALONSO CALEFACCIONALONSO CALEFACCION,但不是 J.ALONSO CALEFACCION(如果你真的想要,这只是一个小小的调整)

This will match things like A. J. ALONSO CALEFACCION, A. ALONSO CALEFACCION and ALONSO CALEFACCION, but not J. ALONSO CALEFACCION (it's only a small tweak if you do want that)

为了清楚起见,将最后的字符串分解:

Breaking up that final string for clarity:

/\b
(
    (ALBERTO|A\.?)[\s\xc2-]+
    (
        (JORGE|J\.?)[\s\xc2,]+
    )?
)?
(ALONSO)[\s\xc2*-]+
(CALEFACCION)
\b/i

最后,这是一个奇怪的想法,但是您可以将可以是首字母的名称更改为以下形式:(A(LBERTO|\.|)),这意味着您不是重复首字母缩写(潜在的错误来源)

Finally, it's an odd thought, but you could change the names which can be initials to be in this form: (A(LBERTO|\.|)), which means you're not repeating the initials (a potential source of mistakes)

这篇关于如何检查一个人全名的不同拼写的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆