将字符集从utf8mb4更改为utf8后,为什么表的索引存储大小更大? [英] Why table's index storage size is bigger after change charset from utf8mb4 to utf8?

查看:539
本文介绍了将字符集从utf8mb4更改为utf8后,为什么表的索引存储大小更大?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

已执行:alter table device_msg convert to character set 'utf8' COLLATE 'utf8_unicode_ci';"

如我所料,表数据大小变小了.

As my expect,table data size change to smaller.

但是同时,表索引大小会变大吗?

But at the same time, table index size change to bigger ?

会发生什么,为什么?

ps:表数据大小和索引大小由information_schema.TABLES

ps: table data size and index size are calculated by information_schema.TABLES

DbEngine:InnoDB

DbEngine: InnoDB

之前的表格:

CREATE TABLE `device_msg` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `sn` varchar(30) COLLATE utf8_unicode_ci NOT NULL,
  `time` datetime(3) NOT NULL,
  `msg` json NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `device_UNIQUE` (`sn`,`time`)
) ENGINE=InnoDB AUTO_INCREMENT=62077733 DEFAULT CHARSET=utf8mb4;

之后的表格:

CREATE TABLE `device_msg` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `sn` varchar(30) COLLATE utf8_unicode_ci NOT NULL,
  `time` datetime(3) NOT NULL,
  `msg` json NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `device_UNIQUE` (`sn`,`time`)
) ENGINE=InnoDB AUTO_INCREMENT=62077733 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;


之前:

totalSize: 2.14 GB
indexSize: 282.98 MB
dataSize: 1.86 GB
avg_row_len:  297B

之后

totalSize: 1.93 GB
indexSize: 413.97 MB
dataSize: 1.52 GB
avg_row_len:  260B

如果information_schema.TABLES的数据不正确,则

If data of information_schema.TABLES is not accurate,

如何使其正确?

推荐答案

我认为

当我在MySQL文档上阅读有关限制的信息时.

As I read on MySQL document about the limitation.

https://dev.mysql.com/doc/refman/5.6/en/innodb-restrictions.html

默认情况下,索引键前缀长度限制为767个字节

By default, the index key prefix length limit is 767 bytes

如果索引列超出此大小,它将被截断. 我假设您的索引列值包含255个字符.

if the index column exceeds this size, it will be truncated. I assume your indexed column value has 255 characters.

对于utf8mb4,1个字符= 4个字节,限制为191个字符. 因此,会将191个字符添加到索引中,其他(255-191 = 64)个字符将从索引中截断.

in the case of utf8mb4, 1 character = 4 bytes, the limit is around 191 characters. So 191 characters will be added to index, other (255-191=64) characters will be truncated from the index.

当您将编码更改为utf8时(当时1个字符= 3个字节),索引限制将变为255个字符左右. 这意味着您的列值(全部255个字符)将被添加到索引中,而不会被截断.

When you change encoding to utf8 (at that time 1 character = 3 bytes), the indexed limit will become around 255 characters. It means your column value, all 255 characters, will be added to index without truncating.

添加到索引的字符从191个字符增加到255个字符,因此索引大小也增加了.

The characters that are added to the index increased from 191 characters to 255 characters, so the index size was also increased.

这篇关于将字符集从utf8mb4更改为utf8后,为什么表的索引存储大小更大?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆