为通用字符选择表排序规则 [英] choosing table collation for universal characters

查看:66
本文介绍了为通用字符选择表排序规则的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发一个需要存储通用字符的后端.

I'm working on a backend that needs to store universal characters.

我为此选择了utf8mb4表编码.我还必须选择表排序规则.

I've chosen utf8mb4 Table Encoding for that purpose. I also have to choose Table Collation.

最直接的选择是选择utf8mb4_general_ci表排序规则.除了一般的排序规则外,还有大约20种其他排序规则可供选择..更具体的排序规则的目的是什么? utf8mb4_general_ciutf8mb4_unicode520_ci是否涵盖所有这些内容?如果要存储从中文一直到阿拉伯文字的字符,应该使用哪一个.

The most straightforward option is to choose utf8mb4_general_ci Table collation. Besides the general one, there is also about 20 others collations to choose from.. What is the purpose of the more specific ones? Does utf8mb4_general_ci or maybe utf8mb4_unicode520_ci cover all of them? Which one should I use if I want to store characters ranging from chinese all the way to arab.

推荐答案

  • ...general_ci很简单.它不会将2个字符的组合(例如带有非空格标记的字符)等同于单个字符的组合.

    • ...general_ci is simple. It does not equate 2-character combinations (such as with a non-spacing mark) with the single-character equivalent.

      ...unicode_520_ci来自Unicode版本5.20,这是MySQL使用MySQL时可用的最新版本.它可以处理诸如订购Emoji表情之类的事情,而以前的版本则没有.

      ...unicode_520_ci comes from Unicode version 5.20, the latest version available when MySQL picked up on it. It handles things like having an ordering for Emoji, which previous versions did not have.

      对于MySQL 8.0,基于Unicode 9.0的首选排序规则是utf8mb4_0900_ai_ci.

      With MySQL 8.0, the preferred collation is utf8mb4_0900_ai_ci, based on Unicode 9.0.

      ...<language>_ci处理以给定语言找到的变体.例如,应该将西班牙语中的chll视为字母",并在czd以及lzm之间进行排序.

      ...<language>_ci handles variations found in the given language. For example, should ch and ll in Spanish be treated as "letters" and sort between cz and d, and lz and m.

      对于一般用途,请不要使用...general_ci,请使用从Unicode派生的最新版本.对于特定于语言的情况,请选择其他归类之一.

      For general use, do not use ...general_ci, use the latest version derived from Unicode. For language-specific situations, pick one of the other collations.

      我确实知道中文(阿拉伯语)在不同归类中的排序方式是否不同(甚至不同).但是,我看到...persion_ci,所以我怀疑是有问题.

      I do know know how (or even whether) Chinese and Arabic are sorted differently in the different collations. However, I see ...persion_ci, so I suspect there is an issue.

      请使用utf8mb4,而不要使用utf8,尤其是因为您需要中文.

      Do use utf8mb4, not utf8, especially since you need Chinese.

      这篇关于为通用字符选择表排序规则的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆