Swift为什么将这个字素簇算作两个字符而不是一个？ [英] Why is Swift counting this Grapheme Cluster as two characters instead of one?

查看：102 发布时间：2020/10/29 5:22:11 swift unicode emoji grapheme

本文介绍了Swift为什么将这个字素簇算作两个字符而不是一个？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

通常，Swift关于将字素簇作为单个字符进行计数非常聪明。例如，如果我想做一个黎巴嫩国旗，我可以将两个Unicode字符组合在一起

Generally Swift is really smart about counting grapheme clusters as a single character. If I want to make a Lebanese flag, for example, I can combine the two Unicode characters

U + 1F1F1区域指示符字母L

U + 1F1E7区域指示符字母B

和预期的一样，这是Swift中的一个字符：

and as expected this is one character in Swift:

let s = "\u{1f1f1}\u{1f1e7}"
assert(s.characters.count == 1)
assert(s.utf16.count == 4)
assert(s.utf8.count == 8)

但是，假设我想制作一个Fitzpatrick Type-5的自行车表情符号。如果我合并

However, let's say I want to make a Bicyclist emoji of Fitzpatrick Type-5. If I combine

U + 1F6B4自行车清单

U + 1F3FE EMOJI MODIFIER FITZPATRICK TYPE-5

Swift将此组合计为两个字符！

Swift counts this combination as two characters!

let s = "\u{1f6b4}\u{1f3fe}"
assert(s.characters.count == 2)   // <----- WHY?
assert(s.utf16.count == 4)
assert(s.utf8.count == 8)

这两个字符为什么不是一个？

Why is this two characters instead of one?

为了显示为什么我期望它是1，请注意，实际上该簇被解释为作为有效的表情符号：

To show why I would expect it be 1, note that this cluster is actually interpreted as a valid emoji:

推荐答案

部分答案在错误报告。将Unicode字符串拆分为字符时，Swift显然会使用 UAX＃中定义的Grapheme簇边界。 29 Unicode文本分段。有一个规则，不得在区域指示符之间切换，但没有这样的规则用于表情符号修饰符。因此，根据UAX＃29，字符串 \u {1f6b4} \u {1f3fe} 包含两个字素簇。请参阅Ken Whistler在Unicode邮件中的此消息列表进行解释：

Part of the answer is given in the bug report mentioned in emrys57's comment. When splitting a Unicode string into "characters", Swift apparently uses the Grapheme Cluster Boundaries defined in UAX #29 Unicode Text Segmentation. There's a rule not to break between regional indicator symbols, but there is no such rule for Emoji modifiers. So, according to UAX #29, the string "\u{1f6b4}\u{1f3fe}" contains two grapheme clusters. See this message from Ken Whistler on the Unicode mailing list for an explanation:

这是由于修饰符的后备行为是
而已，就像单独的象形图一样，即色板图像。 [...]您需要有关这些序列的其他特定
知识-这不仅仅是UAX＃29字素簇的
default 实现的结果。

这篇关于Swift为什么将这个字素簇算作两个字符而不是一个？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Swift为什么将这个字素簇算作两个字符而不是一个？ [英] Why is Swift counting this Grapheme Cluster as two characters instead of one?

问题描述

推荐答案

相关文章

移动开发最新文章

热门教程

热门工具

登录关闭

Swift为什么将这个字素簇算作两个字符而不是一个？ [英] Why is Swift counting this Grapheme Cluster as two characters instead of one?

问题描述

推荐答案

相关文章

移动开发最新文章

热门教程

热门工具

登录 关闭

登录关闭