MySQL varchar索引长度 [英] MySQL varchar index length

查看:2112
本文介绍了MySQL varchar索引长度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个这样的表:

CREATE TABLE `products` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(512) NOT NULL,
  `description` text,
  PRIMARY KEY (`id`),
) ENGINE=InnoDB AUTO_INCREMENT=38 DEFAULT CHARSET=utf8;

和这样的一个:

CREATE TABLE `product_variants` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `product_id` int(11) unsigned NOT NULL,
  `product_code` varchar(255) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `product_code` (`product_code`),
  KEY `product_variant_product_fk` (`product_id`),
  CONSTRAINT `product_variant_product_fk` FOREIGN KEY (`product_id`) REFERENCES `products` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1037 DEFAULT CHARSET=utf8;

和这样的SQL语句

SELECT p.id AS id, p.name AS name, p.description AS description, pv.id AS product_variant_id, pv.product_code AS product_code
FROM products p
INNER JOIN product_variants pv ON pv.product_id = p.id
ORDER BY p.name ASC
LIMIT 300 OFFSET 0;

如果我解释给我这个:

+----+-------------+-------+------+----------------------------+----------------------------+---------+---------+--------+----------------+
| id | select_type | table | type | possible_keys              | key                        | key_len | ref     | rows   | Extra          |
+----+-------------+-------+------+----------------------------+----------------------------+---------+---------+--------+----------------+
|  1 | SIMPLE      | p     | ALL  | PRIMARY                    | NULL                       | NULL    | NULL    | 993658 | Using filesort |
|  1 | SIMPLE      | pv    | ref  | product_variant_product_fk | product_variant_product_fk | 4       | db.p.id |      1 |                |
+----+-------------+-------+------+----------------------------+----------------------------+---------+---------+--------+----------------+
2 rows in set (0.00 sec)

对于一百万行,这非常慢。我尝试在
products.name上添加索引:

For a million rows, this is pretty slow. I've tried adding an index on products.name with:

ALTER TABLE products ADD INDEX `product_name_idx` (name(512));

给出:

mysql> show indexes from products;
+----------+------------+------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table    | Non_unique | Key_name         | Seq_in_index | Column_name     | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| products |          0 | PRIMARY          |            1 | id              | A         |      993658 |     NULL | NULL   |      | BTREE      |         |               |
| products |          1 | product_manf_fk  |            1 | manufacturer_id | A         |          18 |     NULL | NULL   | YES  | BTREE      |         |               |
| products |          1 | product_name_idx |            1 | name            | A         |         201 |      255 | NULL   |      | BTREE      |         |               |
+----------+------------+------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
3 rows in set (0.00 sec)

我认为Sub_part列显示了
索引的前缀(以字节为单位),如此页面

I think that the Sub_part column shows the prefix that has been in indexed (in bytes), as described on this page.

当我重新解释查询时,我得到:

When I re-explain the query, I get:

+----+-------------+-------+------+----------------------------+----------------------------+---------+---------+--------+----------------+
| id | select_type | table | type | possible_keys              | key                        | key_len | ref     | rows   | Extra          |
+----+-------------+-------+------+----------------------------+----------------------------+---------+---------+--------+----------------+
|  1 | SIMPLE      | p     | ALL  | PRIMARY                    | NULL                       | NULL    | NULL    | 993658 | Using filesort |
|  1 | SIMPLE      | pv    | ref  | product_variant_product_fk | product_variant_product_fk | 4       | db.p.id |      1 |                |
+----+-------------+-------+------+----------------------------+----------------------------+---------+---------+--------+----------------+
2 rows in set (0.00 sec)

看起来新的索引没有被使用。如
此页所述,索引不会用于排序,如果它们是
前缀索引。实际上,如果我用以下内容截断数据:

which looks like the new index is not being used. As described on this page, indexes will not be used for sorting if they are prefix indexes. In fact if I truncate the data with:

alter table products modify `name`  varchar(255) not null;

解释给出:

+----+-------------+-------+-------+----------------------------+----------------------------+---------+----------------------------------------------+------+-------+
| id | select_type | table | type  | possible_keys              | key                        | key_len | ref                                          | rows | Extra |
+----+-------------+-------+-------+----------------------------+----------------------------+---------+----------------------------------------------+------+-------+
|  1 | SIMPLE      | p     | index | PRIMARY                    | product_name_idx           | 767     | NULL                                         |  300 |       |
|  1 | SIMPLE      | pv    | ref   | product_variant_product_fk | product_variant_product_fk | 4       | oh_2c98c233_69fe_4f06_ad0d_fe6f85a5beac.p.id |    1 |       |
+----+-------------+-------+-------+----------------------------+----------------------------+---------+----------------------------------------------+------+-------+

我想回来了。但是,它在此页上说
InnoDB表可以有最多767个字节的索引。如果长度是
字节,为什么它拒绝超过255?如果它是
个字符,那么它如何决定每个UTF-8字符的长度?是
它只假设3?

which I think backs that up. However, it says on this page that InnoDB tables can have up to 767 bytes of index. If the length is in bytes, why does it refuse to have more than 255? If it's in characters, how is it deciding the length of each UTF-8 character? Is it just assuming 3?

此外,我正在使用这个版本的MySQL:

Also, am using this version of MySQL:

mysql> select version();
+------------+
| version()  |
+------------+
| 5.5.27-log |
+------------+
1 row in set (0.00 sec)


推荐答案

由于我的研究,我必须修改我的答案。我最初发布这个(引用自己):

I must revise my answer due to my research. I originally posted this (quoting myself):


我相信答案是你不知道
中有多少个字符索引,因为你不知道你的字符
将有多少字节(除非你做了一些事情来排除多字节字符)。

I believe the answer is that you cannot know how many characters will be in the index because you cannot know how many bytes your characters will be (unless you do something to exclude multi-byte characters).

我不确定,但它可能仍然是正确的,但不是我想的那样。

And I'm not sure, but it might still be correct, but not in quite the way I was thinking.

这是正确答案:

MySQL假定每个utf8字符有3个字节。 255个字符是您可以为每列指定的最大索引大小,因为256x3 = 768,它打破了767字节的限制。

如果您没有指定索引大小,MySQL选择最大大小(即每列255)。
UNIQUE约束不能放在长度大于255的utf8列上,因为唯一索引必须包含整个单元格值。但是可以使用常规索引 - 它只会索引前255个字符(或前767个字节?)。这就是我仍然有些神秘的地方。

If you don't specify index size, MySQL chooses the maximum size (i.e. 255 per column). A UNIQUE constraint cannot be put on a utf8 column whose length is greater than 255, because a unique index must contain the entire cell value. But a regular index can be used - it will just index the first 255 characters (or first 767 bytes?). And that is where there is still some mystery for me.

MySTERY:
为了安全起见,我可以看出为什么MySQL假设每个字符有3个字节,否则可以打破UNIQUE约束。但是文档似乎暗示索引实际上是以字节为单位的,而不是字符。因此,假设您在varchar(25 6 )列上放置了25 5 char(765字节)索引。如果您存储的字符都是ASCII,1字节字符,如A-Z,a-z,0-9,那么您可以将整个列放入767字节索引中。这似乎就是实际发生的事情。

The MySTERY: I can see why MySQL assumes 3 bytes per character, for safety, because otherwise the UNIQUE constraint could be broken. But the docs seem to suggest that the index is actually sized in bytes, not characters. So, suppose you put a 255 char (765 byte) index on a varchar(256) column. If the characters you store are all ASCII, 1-byte characters, like A-Z, a-z, 0-9, then then you can fit the entire column into the 767 byte index. And it seems like that is what would actually happen.

以下是我原来答案中有关字符,字节等的更多信息。

Below is some more information from my original answer about characters, bytes, etc.

根据维基百科,UTF-8字符长度可以是1,2,3或4个字节。
但是,根据这个mysql文档 ,maximium字符大小为3个字节,因此任何超过255个字符的列索引索引都可能达到该字节限制。但据我所知,它可能不会。如果您的大多数字符都在ASCII范围内,那么您的平均字符大小将接近1个字节。例如,如果您的平均字符大小是1.3字节(大多数是1个字节,但是大量的2-3个字节字符),那么您可以指定索引767 / 1.3

According to wikipedia, UTF-8 character can be 1,2, 3, or 4 bytes long. But, according to this mysql documentation, the maximium character size is 3 bytes, and so any column index index over 255 characters might hit that byte limit. But as I understand it, it might not. If most of your characters are in the ASCII range, then your average character size will be closer to 1 byte. If your average character size is, for example, 1.3 bytes (mostly 1 byte, but a significant number of 2-3 byte characters), then you could specify an index of 767/1.3

因此,如果您存储大多数1字节字符,您的实际字符限制将更像:
767 / 1.3 = 590.但事实证明这不是它的工作方式。 255个字符是限制。

So, if you are storing mostly 1-byte characters, your actual character limit would be more like: 767 / 1.3 = 590. But it turns out that is not the way it works. 255 characters is the limit.

此MySQL文档


前缀限制以字节为单位,而前缀长度在
中,CREATE INDEX语句被解释为
非二进制数据类型(CHAR,VARCHAR,TEXT)的字符数。在为使用多字节
字符集的列指定前缀长度时,请考虑

Prefix limits are measured in bytes, whereas the prefix length in CREATE INDEX statements is interpreted as number of characters for nonbinary data types (CHAR, VARCHAR, TEXT). Take this into account when specifying a prefix length for a column that uses a multi-byte character set.



<似乎MySQL建议人们像我刚才那样进行计算/猜测,以确定varchar列的密钥大小。但实际上你不能为utf8列指定一个大于255的索引。

It seems that MySQL is advising people to do a calculation/guestimation like I just did in order to determine your key size for a varchar column. But in fact you cannot specify an index larger than 255 for utf8 columns.

最后,如果再次引用我的第二个链接,那里也是这样:

Finally, if you refer back to my second link again, there is also this:


当启用innodb_large_prefix配置选项时,对于InnoDB表,此
长度限制被提升到3072字节使用
DYNAMIC和COMPRESSED行格式。

When the innodb_large_prefix configuration option is enabled, this length limit is raised to 3072 bytes, for InnoDB tables that use the DYNAMIC and COMPRESSED row formats.

所以看起来如果你想要得到更大的索引,稍微调整一下。只需确保行格式为DYNAMIC或COMPRESSED。在这种情况下,你可以指定一个1023或1024个字符的索引。

So it seems like you can get much larger indexes if you want to, with a bit of tweaking. Just make sure the row formats are DYNAMIC or COMPRESSED. You can probably specify an index of 1023 or 1024 characters in that case.



顺便说一下,你可以存储4个字符使用 utf8mb4字符集的字节字符。
utf8字符集显然只存储飞机0字符

编辑:

我只是尝试在varchar(511)列上创建一个带有tinyint(1)列的复合索引并收到错误消息,指出最大索引大小为767字节。这让我相信MySQL假定utf8字符集列每个字符包含3个字节(最大值),并允许您使用最多255个字符。但也许这只是复合索引。当我发现更多时,我会更新我的答案。但是现在我把它留作编辑。

I just tried to create a composite index on a varchar(511) column with a tinyint(1) column and got the error message saying the max index size was 767 bytes. This makes me believe that MySQL assumes utf8 character set columns will contain 3 bytes per character (the maximum), and allows you to use 255 chars max. But perhaps that is only with composite indexes. I will update my answer as I find out more. But for now I'm leaving this as an edit.

这篇关于MySQL varchar索引长度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆