人名的所有允许字符是什么? [英] What are all of the allowable characters for people's names?

查看:13
本文介绍了人名的所有允许字符是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有标准的 A-Z、a-z 字符,也有连字符、破折号、引号等.

There are the standard A-Z, a-z characters, but also there are hyphens, em dashes, quotes, etc.

此外,还有所有国际字符,如变音等.

Plus, there are all of the international characters, like umlauts, etc.

那么,对于一个以英文为基础的系统,完整的一套是什么?其他语言的集合呢?UTF8、UTF16 等呢?

So, for an English-based system, what's the complete set? What about sets for other languages? What about UTF8, UTF16, etc?

额外问题:需要多少个名称字段,它们的最大长度是多少?

Bonus question: How many name fields are needed, and what are their maximum lengths?

人名中肯定涉及两种不同类型的字符,一种是作为上下文的一部分,另一种是由于结构原因而存在的.我不想限制或干扰上下文字符,但我确实需要处理结构字符.

There are definitely two different types of characters involved in people's names, those that are there as part of the context, and those that are there for structural reasons. I don't want to limit or interfere with the context characters, but I do need to deal with the structural ones.

例如,我输入了一个由破折号分隔的名称,但很难将其与减号区分开来.为了使系统更易于搜索,我想采用所有五种不同类型的破折号,并将它们映射到一个唯一字符(减号)上,这样搜索者就不需要特别知道最初输入的是哪个符号.

For example, I had a name come in that was separated by an em dash, but it was hard to distinguish that from the minus character. To make the system easier for searching, I want to take all five different types of dashes, and map them onto one unique character (minus), that way the searcher doesn't need to know specifically which symbol was initially entered.

问题存在于破折号,可能还有引号,还有多少其他符号?

The problem exists for dashes, probably quotes as well, but also how many other symbols?

推荐答案

W3C 有一篇名为 世界各地的人名 很好地解释了问题(和可能的解决方案)(它最初是 Richard Ishida 的两部分博客文章:第 1 部分第 2 部分)

There's good article by the W3C called Personal names around the world that explains the problems (and possible solutions) pretty well (it was originally a two-part blog post by Richard Ishida: part 1 and part 2)

就我个人而言,我会说:支持每个可打印的 Unicode 字符,并且为了安全起见,只提供一个包含完整格式化名称的字段名称".通过这种方式,您可以存储几乎所有形式的名称.您可能需要更结构化的存储,但不要期望能够以结构化形式存储每个组合,因为不同的组合实在太多了.

Personally I'd say: support every printable Unicode-Character and to be safe provide just a single field "name" that contains the full, formatted name. This way you can store pretty much every form of name. You might need a more structured storage, but then don't expect to be able to store every single combination in a structured form, as there are simply too many different ones.

这篇关于人名的所有允许字符是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆