使用ASCII/拉丁字符集是否可以加速数据库? [英] Does using ASCII/Latin Charset speed up the database?

查看:113
本文介绍了使用ASCII/拉丁字符集是否可以加速数据库?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

似乎对大多数字段使用ASCII字符集,然后仅对需要的字段指定utf8,这将使数据库必须执行的I/O数量减少100%.

It would seem that using the ASCII charset for most fields and then specify utf8 only for the fields that need it would reduce the amount of I/O the database must perform by 100%.

有人知道这是真的吗?

更新:上面的内容并不是我真正的问题.我应该说:使用拉丁语作为默认字符集,然后仅对需要它的字段指定utf8mb4.这种想法是:使用1字节vs 2字节应该将I/O提高100%.抱歉造成混乱.

Update: The above was not really my question. I should have said: use Latin for the default character set and then only specify utf8mb4 only for the fields that need it. The thinking being that: using 1 byte vs 2 bytes should improve I/O by 100%. Sorry for the confusion.

推荐答案

@RickJames是正确的,您不必担心通过在utf8mb4上选择ASCII或utf8来节省空间.

@RickJames is right, you should not worry about saving space by choosing ASCII or utf8 over utf8mb4.

utf8和utf8mb4是可变长度字符编码. 维基百科的表格说明了字符如何自动占用1、2、3或4个字节每个,取决于编码的值.如果设置了字节的高位,则字符将使用一个额外的字节,最多4个字节.

utf8 and utf8mb4 are variable-length character encodings. This table from wikipedia illustrates how characters automatically take 1, 2, 3, or 4 bytes each, depending on the value encoded. If the high bit of a byte is set, then the character uses an additional byte, up to 4 bytes.

维基百科文章清楚地解释了这一点:

The wikipedia article explains it clearly:

前128个字符(US-ASCII)需要一个字节.接下来的1,920个字符需要两个字节进行编码,涵盖几乎所有拉丁字母字母表的其余部分,以及希腊语,西里尔字母,科普特语,亚美尼亚语,希伯来语,阿拉伯语,叙利亚字母,Thaana和N'Ko字母,以及组合变音符号分数.基本多语言平面的其余部分中的字符需要三个字节,其中几乎包含了所有常用的字符,包括大多数中文,日文和韩文字符. Unicode其他平面中的字符需要四个字节,其中包括较少见的CJK字符,各种历史脚本,数学符号和表情符号(象形符号).

The first 128 characters (US-ASCII) need one byte. The next 1,920 characters need two bytes to encode, which covers the remainder of almost all Latin-script alphabets, and also Greek, Cyrillic, Coptic, Armenian, Hebrew, Arabic, Syriac, Thaana and N'Ko alphabets, as well as Combining Diacritical Marks. Three bytes are needed for characters in the rest of the Basic Multilingual Plane, which contains virtually all characters in common use including most Chinese, Japanese and Korean characters. Four bytes are needed for characters in the other planes of Unicode, which include less common CJK characters, various historic scripts, mathematical symbols, and emoji (pictographic symbols).

您无需执行任何操作即可选择单字节模式还是多字节模式.这就是编码的工作方式.每个字符都会自动使用所需的字节数,而不会更多.

You don't have to do anything to choose single-byte versus multi-byte mode. This is just the way the encoding works. Each character automatically uses the number of bytes it needs, and no more.

因此,除非您需要限制字符串中允许的字符,否则使用utf8而不是utf8mb4并没有使用ASCII的优势.

So there is no advantage to using utf8 over utf8mb4, and no advantage of using ASCII over either, unless you need to restrict the characters allowed in a string.

就其价值而言,MySQL称为"utf8"的字符集是utf8mb3的别名,utf8mb3只是UTF8编码的前三个字节的实现. MySQL服务器小组博客( https://mysqlserverteam .com/mysql-8-0-when-to-use-utf8mb3-over-utf8mb4/)表示,至少考虑到MySQL 8.0中的性能改进,utf8mb4速度更快,并且utf8mb3应该被弃用. MySQL 8.0.11发行说明说在将来的某些MySQL版本中,utf8将被重新定义为utf8mb4的别名.

For what it's worth, the character set MySQL calls "utf8" is an alias for utf8mb3, an implementation of just the first three bytes of the UTF8 encoding. The MySQL server team blog (https://mysqlserverteam.com/mysql-8-0-when-to-use-utf8mb3-over-utf8mb4/) says that utf8mb4 is faster, at least given performance improvements in MySQL 8.0, and utf8mb3 should be considered deprecated. MySQL 8.0.11 release notes say that utf8 will be redefined as an alias for utf8mb4 in some future version of MySQL.

这篇关于使用ASCII/拉丁字符集是否可以加速数据库?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆