为什么从utf8更改为utf8mb4会使我的数据库变慢? [英] Why did changing from utf8 to utf8mb4 slow down my database?

查看:367
本文介绍了为什么从utf8更改为utf8mb4会使我的数据库变慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

PHP Web应用程序中的所有MySQL表都是具有utf8编码的MyISAM.由于可以在离线状态下通过配套应用生成记录,因此我的表键是随机生成的字母数字VARCHAR;这些字段使用utf8_bin编码设置为二进制,因此它们区分大小写.

All the MySQL tables in my PHP web application are MyISAM with utf8 encoding. Since records can be generated from a companion app while it's offline, my table keys are randomly generated, alphanumeric VARCHARs; these fields are set to binary with utf8_bin encoding so they can be case-sensitive.

我最近决定更改我所有文本字段的编码,以支持某些用户喜欢输入的表情符号.我继续将所有utf8字段更改为utf8mb4,包括密钥.我立即开始看到性能问题,其中对一个较大的表进行复杂的SELECT查询要花费一分钟多的时间,然后其他查询排队等待表锁.我将该表上的主键字段的编码改回了utf8,性能恢复了正常.几天后,我再次将该字段更改为utf8mb4,查询再次开始排队,然后又将其更改为恢复正常性能.

I recently decided to change the encoding of all my text fields, to support emojis that some users like to enter. I went ahead and changed all utf8 fields to utf8mb4, including the keys. I immediately started seeing performance issues, where complex SELECT queries on one of the larger tables took more than a minute, and then other queries queued up waiting for table locks. I changed the encoding of the primary key field on that table back to utf8, and performance returned to normal. A couple days later, I changed that one field to utf8mb4 again, the queries started queueing up again, and I changed it back to restore the normal performance.

因此运行顺利:

`ID` varchar(8) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT ''

但这会导致问题:

`ID` varchar(8) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL DEFAULT ''

我读过的所有内容都说utf8和utf8mb4应该具有相同的性能,但是我发现我的情况有明显的不同.这有道理吗?

Everything I've read says utf8 and utf8mb4 should have equivalent performance, but I'm seeing a distinct difference in my case. Does this make sense?

将关键字段保留在utf8并不是一个真正的问题,因为我预见在那里不会再使用简单的字母数字字符.但是我希望将所有字段都设置为相同的编码,只是为了保持一致和简化维护(不必记住将用户填充的字段设置为一种编码,而将关键字段设置为另一种编码).

It's not really a problem to keep the key fields at utf8, since I don't foresee ever using more than simple alphanumeric characters there. But I would have liked to have all the fields set to the same encoding just for consistency and simplicity of maintenance (don't have to remember to set user-populated fields to one encoding and key fields to another encoding).

关于@MandyShaw的评论

当我使用Sequel Pro Mac应用程序处理数据库时,控制台会不断显示成对的 SET NAMES'utf8' SET NAMES'utf8mb4'条目,因此这并不意味着并非所有设置都正确.但是,这是我目前所拥有的:

When I work with the database with the Sequel Pro Mac app, the console constantly shows pairs of SET NAMES 'utf8' and SET NAMES 'utf8mb4' entries, so that does suggest not everything is set correctly. However, here's what I have currently:

MySQL [(none)]> SHOW GLOBAL VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
+--------------------------+--------------------+
| Variable_name            | Value              |
+--------------------------+--------------------+
| character_set_client     | utf8mb4            |
| character_set_connection | utf8mb4            |
| character_set_database   | utf8mb4            |
| character_set_filesystem | binary             |
| character_set_results    | utf8mb4            |
| character_set_server     | utf8mb4            |
| character_set_system     | utf8               |
| collation_connection     | utf8mb4_unicode_ci |
| collation_database       | utf8mb4_unicode_ci |
| collation_server         | utf8mb4_unicode_ci |
+--------------------------+--------------------+

我了解到 character_set_system 不能从utf8更改,并且 character_set_filesystem 应该是二进制的.

I read that character_set_system can't be changed from utf8 and character_set_filesystem should be binary.

Sequel Pro的连接编码设置为自动检测",但是当我将其明确更改为utf8mb4,然后打开一个新连接时,我仍然在控制台中看到所有这些编码更改.

Sequel Pro's connection encoding was set to Autodetect, but when I change it explicitly to utf8mb4, then open a new connection, I still see all those encoding changes in the console.

是否需要更改其他内容以一致地使用此编码?

Is there something else I need to change to use this encoding consistently?

推荐答案

utf实际上是utfmb3,每个字符最多可以使用3个字节,而utfmb4每个字符最多可以使用4个字节.对于VARCHAR列,这通常没有太大区别,因为MySQL仅存储所需的字节数(除非您使用ROW_FORMAT = FIXED创建了MyISAM表).

utf is really utfmb3 and may use max 3 bytes per character while utfmb4 may use 4 bytes per character. For VARCHAR columns this does not normally much difference since MySQL will store only as many bytes as needed (unless you have created your MyISAM tables with ROW_FORMAT=FIXED).

但是,在查询执行期间,MySQL可能会在不支持可变长度行的MEMORY存储引擎中创建临时表.这些临时表具有最大大小,如果超过该大小,则临时表将转换为MyISAM/InnoDB中的表(取决于您的MySQL版本).每次发生这种情况时,状态变量 Created_tmp_disk_tables 都会增加.如果是这样,请尝试查看是否有助于增加 max_heap_table_size tmp_table_size 的值.

However, during query execution, MySQL may create temporary tables in the MEMORY storage engine which does not support variable-length rows. These temporary tables have a maximum size, and if that size is exceeded, the temporary tables will be converted to tables in MyISAM/InnoDB (depending on your version of MySQL). The status variable Created_tmp_disk_tables will be incremented each time this happens. If so, try to see if it helps to increase the value of max_heap_table_size and tmp_table_size.

或者,升级到MySQL 8.0,其中将一个支持可变长度行的新存储引擎用于内部临时表.

Alternatively, upgrade to MySQL 8.0 where a new storage engine that supports variable-length rows is used for internal temporary tables.

这篇关于为什么从utf8更改为utf8mb4会使我的数据库变慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆