拆分全名的启发式方法 [英] Heuristics for splitting full names

查看：38 发布时间：2021/9/15 19:09:52 user-experience heuristics names

本文介绍了拆分全名的启发式方法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

将全名拆分成名字和姓氏是一个无法解决的问题，因为名字真的非常复杂.因此，我的模型代表一本书的作者和其他贡献者，包括 name 和 filingName 字段，其中 filingName 通常应该是最后，第一"(西方名字).

Splitting a full name into first and last names is an unsolvable problem because names are really, really complicated. As a result, my model, which represents authors and other contributors to a book, includes both name and filingName fields, where filingName should usually be "Last, First" (for Western names).

但是，为了方便我的用户，我想让我的应用在用户填写常规名称时对归档名称进行合理的猜测.当然，如果猜错了，用户可以编辑备案名称，但如果我猜对了，我会为他们节省一些时间.目前我只是假设最后一个空格分隔的单词"是姓氏，并用逗号将其移到前面:

However, as a convenience for my users, I'd like to have my app make a reasonable guess at the filing name when the user fills in the regular name. The user can edit the filing name if the guess is wrong, of course, but if I guess right, I'll have saved them some time. Currently I'm simply assuming the last space-separated "word" is the last name and moving it to the front with a comma:

NSMutableArray * parts = [self.name componentsSeparatedByCharactersInSet:NSCharacterSet.whitespaceCharacterSet].mutableCopy;

if(parts.count < 2) {
    return self.name;
}

NSString * lastName = parts.lastObject;
[parts removeLastObject];

return [NSString stringWithFormat:@"%@, %@", lastName, [parts componentsJoinedByString:@" "]];

我可以立即想到一种会使我误入歧途的情况:像Jr"这样的后缀.但我相信还有很多其他的.是否有任何解释常见命名警告的好资源，或解决此问题的代码的好示例，我可以使用它们来改进我的启发式方法吗?我在 Mac 上使用 Objective-C(以防框架的某个模糊角落可以帮助我)，但我愿意从用任何语言编写的代码中学习.

I can immediately think of one case where this will lead me astray: suffixes like "Jr". But I'm sure there are many others. Are there any good resources explaining common naming caveats, or good examples of code tackling this problem, that I can use to improve my heuristic? I'm using Objective-C on the Mac (in case there's some obscure corner of a framework that could help me), but I'm willing to learn from code written in any language.

这类问题已经询问之前，但大多数答案要么专注于拆分字符串的机制，要么转向设计你的模式不同".我正在以不同的方式设计我的模型；我只是想让计算机为我的用户完成大部分工作.

This sort of question has been asked before, but most answers either focus on the mechanics of splitting apart a string, or devolve into "design your model differently". I am designing my model differently; I'm just looking to let the computer do most of my users' work for them.

正如我之前所说，这段代码主要处理作者和其他书籍贡献者的姓名.其中的一些具体后果包括:

As I said earlier, this code is mainly handling the names of authors and other contributors to books. Some of the specific ramifications of that include:

name 中应该只有一个名字，因为我支持将多个作者附加到一本书中.
大多数名字不会有头衔，而是像博士"这样的专业头衔.可以出现.理想情况下，这些将被丢弃，而不是作为名字的一部分.
名称通常是人名，但有时也可能是组织.我完全愿意冒着修改组织名称的风险来更好地处理人名.
我预计我将主要处理欧洲名称，尽管检测名称的拼写法应该不难.
代码不应对用户的区域设置特别敏感.

There should only be one name in name, because I support attaching multiple authors to a book.
Most names will not have titles, but professional titles like "Dr." could show up. Ideally these would be discarded, not treated as part of the first name.
The names will usually be of people, but could sometimes be of organizations. I'm perfectly willing to risk mangling organization names to get better person name handling.
I expect I will mostly be handling European names, although detecting the orthography of the name should not be difficult.
The code should not be particularly sensitive to the user's locale.

拆分全名的启发式方法 [英] Heuristics for splitting full names

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

拆分全名的启发式方法 [英] Heuristics for splitting full names

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭