拆分全名的启发式方法 [英] Heuristics for splitting full names

查看:38
本文介绍了拆分全名的启发式方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

将全名拆分成名字和姓氏是一个无法解决的问题,因为 名字真的非常复杂.因此,我的模型代表一本书的作者和其他贡献者,包括 namefilingName 字段,其中 filingName 通常应该是最后,第一"(西方名字).

Splitting a full name into first and last names is an unsolvable problem because names are really, really complicated. As a result, my model, which represents authors and other contributors to a book, includes both name and filingName fields, where filingName should usually be "Last, First" (for Western names).

但是,为了方便我的用户,我想让我的应用在用户填写常规名称时对归档名称进行合理的猜测.当然,如果猜错了,用户可以编辑备案名称,但如果我猜对了,我会为他们节省一些时间.目前我只是假设最后一个空格分隔的单词"是姓氏,并用逗号将其移到前面:

However, as a convenience for my users, I'd like to have my app make a reasonable guess at the filing name when the user fills in the regular name. The user can edit the filing name if the guess is wrong, of course, but if I guess right, I'll have saved them some time. Currently I'm simply assuming the last space-separated "word" is the last name and moving it to the front with a comma:

NSMutableArray * parts = [self.name componentsSeparatedByCharactersInSet:NSCharacterSet.whitespaceCharacterSet].mutableCopy;

if(parts.count < 2) {
    return self.name;
}

NSString * lastName = parts.lastObject;
[parts removeLastObject];

return [NSString stringWithFormat:@"%@, %@", lastName, [parts componentsJoinedByString:@" "]];

我可以立即想到一种会使我误入歧途的情况:像Jr"这样的后缀.但我相信还有很多其他的.是否有任何解释常见命名警告的好资源,或解决此问题的代码的好示例,我可以使用它们来改进我的启发式方法吗?我在 Mac 上使用 Objective-C(以防框架的某个模糊角落可以帮助我),但我愿意从用任何语言编写的代码中学习.

I can immediately think of one case where this will lead me astray: suffixes like "Jr". But I'm sure there are many others. Are there any good resources explaining common naming caveats, or good examples of code tackling this problem, that I can use to improve my heuristic? I'm using Objective-C on the Mac (in case there's some obscure corner of a framework that could help me), but I'm willing to learn from code written in any language.

这类问题已经 询问 之前,但大多数答案要么专注于拆分字符串的机制,要么转向设计你的模式不同".我正在以不同的方式设计我的模型;我只是想让计算机为我的用户完成大部分工作.

This sort of question has been asked before, but most answers either focus on the mechanics of splitting apart a string, or devolve into "design your model differently". I am designing my model differently; I'm just looking to let the computer do most of my users' work for them.

正如我之前所说,这段代码主要处理作者和其他书籍贡献者的姓名.其中的一些具体后果包括:

As I said earlier, this code is mainly handling the names of authors and other contributors to books. Some of the specific ramifications of that include:

  • name 中应该只有一个名字,因为我支持将多个作者附加到一本书中.
  • 大多数名字不会有头衔,而是像博士"这样的专业头衔.可以出现.理想情况下,这些将被丢弃,而不是作为名字的一部分.
  • 名称通常是人名,但有时也可能是组织.我完全愿意冒着修改组织名称的风险来更好地处理人名.
  • 我预计我将主要处理欧洲名称,尽管检测名称的拼写法应该不难.
  • 代码不应对用户的区域设置特别敏感.
  • There should only be one name in name, because I support attaching multiple authors to a book.
  • Most names will not have titles, but professional titles like "Dr." could show up. Ideally these would be discarded, not treated as part of the first name.
  • The names will usually be of people, but could sometimes be of organizations. I'm perfectly willing to risk mangling organization names to get better person name handling.
  • I expect I will mostly be handling European names, although detecting the orthography of the name should not be difficult.
  • The code should not be particularly sensitive to the user's locale.

推荐答案

当你构建一个软件系统时,总会有一些严重的问题会耗费大量时间.我不会被困于此,因为没有全球命名约定或规则.我不认为要求用户输入他/她的申请名称会很麻烦,因为他们会只输入一次.

When you build a software system, there are always serious problems that consume a lot of time. I wouldn't get stucked with this because there is no worldwide naming conventions nor rules. I don't think asking the user to enter his/her filing name will be a bother, for they'll do it just once.

恕我直言,这似乎是更简单的解决方案.

That seems to be the easier solution IMHO.

这篇关于拆分全名的启发式方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆