在mysql中utf8mb4和utf8 charsets有什么区别? [英] What is the difference between utf8mb4 and utf8 charsets in mysql?

查看:194
本文介绍了在mysql中utf8mb4和utf8 charsets有什么区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在mysql中的 utf8mb4 utf8 字符串之间有什么区别?


我已经了解了 ASCII,UTF-8,UTF-16 UTF-32 编码;
但是我很好奇地知道'utf8mb4'组编码与mysql服务器中定义的其他编码类型的区别是什么。

What is the difference between utf8mb4 and utf8 charsets in mysql?
I already know about ASCII, UTF-8, UTF-16 and UTF-32 encodings; but I'm curious to know whats the difference of 'utf8mb4' group of encodings with other encoding types defined in mysql server.

有没有任何使用utf8mb4而不是utf8的特殊优点/建议?

推荐答案

UTF-8 是一个可变长度的编码。在UTF-8的情况下,这意味着存储一个代码点需要一到四个字节。但是,MySQL的编码utf8只能存储每个代码点最多三个字节。

UTF-8 is a variable-length encoding. In the case of UTF-8, this means that storing one code point requires one to four bytes. However, MySQL's encoding called "utf8" only stores a maximum of three bytes per code point.

所以字符集utf8不能存储所有的Unicode代码点:它只支持范围0x000到0xFFFF,这被称为基本多语言平面
另请参见 Unicode编码的比较

So the character set "utf8" cannot store all Unicode code points: it only supports the range 0x000 to 0xFFFF, which is called the "Basic Multilingual Plane". See also Comparison of Unicode encodings.

这是正式的文档不得不说:


名为utf8的字符集每个字符最多使用三个字节,只包含BMP字符。从MySQL 5.5.3起,utf8mb4字符集每个字符最多使用四个字节,支持补充字符:

The character set named utf8 uses a maximum of three bytes per character and contains only BMP characters. As of MySQL 5.5.3, the utf8mb4 character set uses a maximum of four bytes per character supports supplemental characters:


  • BMP字符,utf8和utf8mb4具有相同的存储特性:相同的代码值,相同的编码,长度相同。

  • For a BMP character, utf8 and utf8mb4 have identical storage characteristics: same code values, same encoding, same length.

对于补充字符, utf8不能存储所有字符,而utf8mb4需要四个字节来存储它。由于utf8根本无法存储字符,所以在utf8列中没有任何补充字符,当从旧版本的MySQL升级utf8数据时,您不必担心转换字符或丢失数据。

For a supplementary character, utf8 cannot store the character at all, while utf8mb4 requires four bytes to store it. Since utf8 cannot store the character at all, you do not have any supplementary characters in utf8 columns and you need not worry about converting characters or losing data when upgrading utf8 data from older versions of MySQL.

所以如果你希望你的列支持存储位于BMP之外的字符(通常你想要的),例如<一个href =https://en.wikipedia.org/wiki/Emoji =noreferrer>表情符号,使用utf8mb4。另请参见什么是最常见的非-BMP Unicode字符在实际使用?

So if you want your column to support storing characters lying outside the BMP (and you usually want to), such as emoji, use "utf8mb4". See also What are the most common non-BMP Unicode characters in actual use?.

这篇关于在mysql中utf8mb4和utf8 charsets有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆