为MySQL选择数据类型? [英] Choosing data type for MySQL?

查看:73
本文介绍了为MySQL选择数据类型?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经研究和阅读了几天的SQL数据类型(我知道...我知道,那不是很长),而我难以掌握的事情之一就是如何选择最好的方法.数据类型,以实现可扩展性,效率和易于访问.

I have been researching and reading about SQL data types for a few days now (I know... I know, that's not very long) and one of the things that is hard for me to grasp is how to pick the best data type for expandability, efficiency, and ease of access.

我认为直接选择基本数据类型(即int与varchar)非常简单,但是如何在blob和文本类型之间进行选择.

I think it's pretty straight forward how to choose basic Data Types (i.e. int vs varchar) but how do you choose between things like blob and text type.

MySQL手册页很棒,但不是我们的计算机爱好者所喜欢的...高效.

The MySQL man pages are great but they are not what we computer geeks love... efficient.

我认为,如果我们可以汇编MySQL数据类型的列表,每种数据类型的一般优缺点以及何时选择该数据类型,那将是很棒的事情.

I think it would be awesome if we could compile a list of the MySQL data types, the general advantages/disadvantages of each and when it would make sense to choose that data type.

推荐答案

MySQL字符串类型有两种变体:一种不带字符集标签,而另一种带字符集标签.

MySQL string types come in two variants: One without a character set label and one with a character set label.

固定长度的字符串,其末尾用空格填充,为CHAR(n).没有字符集标签的匹配类型为BINARY(n).将字符串"hello"存储在CHAR(255) CHARSET utf8中将占用765个字节(字符串以空格填充为完整长度,存储为utf8,在最坏的情况下,每3个字符的空间使用会分配3 * 255字节).

A fixed length string, padded with spaces at the end, is CHAR(n). The matching type that has no character set label is BINARY(n). Storing the string "hello" in a CHAR(255) CHARSET utf8 will take 765 bytes (string padded with spaces to full length, stored as utf8 which as a worst case space usage of 3 bytes/character allocates 3*255 bytes).

具有一个或两个长度字节并且没有填充的可变长度字符串是VARCHAR((n).没有字符集标签的匹配类型是VARBINARY(n).将字符串"hello"存储在VARCHAR(255) CHARSET utf8中将占用6个字节(1个长度字节加5个字节的实际文本).以相同类型存储字符串クリス将占用10个字节(1个长度字节加3个字符,每个字符用3个字节表示).

A variable length string with one or two length bytes and no padding is VARCHAR((n). The matching type that has no character set label is VARBINARY(n). Storing the string "hello" in a VARCHAR(255) CHARSET utf8 will take 6 bytes (1 length byte plus 5 bytes for the actual text). Storing the string クリス in the same type will take 10 bytes (1 length byte plus 3 characters using 3 bytes per character to represent them).

mysql> select hex('クリス'), length(hex('クリス'))/2 as bytes;
+--------------------+--------+
| hex('クリス')      | bytes  |
+--------------------+--------+
| E382AFE383AAE382B9 | 9.0000 |
+--------------------+--------+
1 row in set (0.02 sec)

具有1个,2个,3个或4个长度字节的可变长度字符串是TINYTEXT,TEXT,MEDIUMTEXT和LARGETEXT.没有字符集标签的匹配类型为TINYBLOB,BLOB,MEDIUMBLOB和LARGEBLOB.

A variable length string with one, two, three or four length bytes is TINYTEXT, TEXT, MEDIUMTEXT and LARGETEXT. The matching types that have no character set label are TINYBLOB, BLOB, MEDIUMBLOB and LARGEBLOB.

类似于TEXT/BLOB的类型与类似于VARCHAR/VARBINARY的类型在数据的存储方式和存储方式上有所不同,请参见

A TEXT/BLOB-like type differs from a VARCHAR/VARBINARY-like type in how and where the data is stored, see http://www.mysqlperformanceblog.com/2010/02/09/blob-storage-in-innodb/ for details on how TEXT/BLOB-like types are stored in InnoDB depending on version and ROW_FORMAT settings. For performance reasons, you want a recent version of InnoDB and "Barracuda"-Format tables.

除非您在服务器端构造复杂且占用大量内存的变通办法,否则MySQL无法处理任何大于max_allowed_pa​​cket(默认值:1M)大小的数据.这进一步限制了使用类似TEXT/BLOB的类型可以执行的操作,并且通常使LARGETEXT/LARGEBLOB类型在默认配置下无效.

MySQL is incapable of working with any data that is larger than max_allowed_packet (default: 1M) in size, unless you construct complicated and memory intense workarounds at the server side. This further restricts what can be done with TEXT/BLOB-like types, and generally makes the LARGETEXT/LARGEBLOB type useless in a default configuration.

对于没有字符集标签的类型(BINARY,VARBINARY和%BLOB%),MySQL将接受接收到的数据并将其写入磁盘.对于带有字符集标签的类型,MySQL将查看您通过SET NAMES声明为服务器客户端字符集的内容以及定义的列是什么字符集标签.然后它将从连接字符集转换为列字符集并写入转换后的数据.您可以使用HEX()函数进行检查,例如SELECT HEX(str) FROM t WHERE id = ....

For types without a character set label (BINARY, VARBINARY and %BLOB%), MySQL will accept the data as received and write it to disk. For types with a character set label, MySQL will look at what you announced as your clients character set to the server with SET NAMES and what is the columns defined character set label. It will then convert from the connection character set to the column character set and write the converted data. You can check that with the HEX() function, e.g. SELECT HEX(str) FROM t WHERE id = ....

检索时,使用SET NAMES声明的连接字符集可能与写入时的字符集有所不同. MySQL将再次针对为此连接宣布的字符集检查列字符集标签,并在必要时将其转换为连接字符集.

On retrieval, the connections announced character set with SET NAMES can be different from what it had been at write time. MySQL again will check the columns character set label against the character set announced for this connection, and if necessary, will convert to the connection character set.

无论哪种方式,与转换产生的数据I/O所花费的时间相比,无论哪种方式的性能损失都可以忽略不计,从性能上来说,选择哪种类型几乎无关紧要.相反,规则是:如果要使用文本数据,则选择带有字符集标签的类型,如果不是,则选择不带字符集标签的类型.

The performance penalty for that conversion either way is negligible compared to the time taken for the disk I/O incurred for such data anyway, to performancewise it hardly matters which type you choose. The rule is instead: Chose a type with a character set label, if you are working with textual data, and a type without if you are not.

一个经常被问到的相关问题:我应该选择CHAR还是VARCHAR(分别是BINARY或VARBINARY)?

A related question often asked: Shall I choose CHAR or VARCHAR (BINARY or VARBINARY respectively)?

对于InnoDB,答案始终是:选择可变长度数据类型. InnoDB中的固定长度数据类型永远不会带来性能上的好处,但是如果您选择固定长度数据类型然后又不使用其中的所有空间,则会产生巨大的尺寸损失.再加上固定长度的SQL字符串类型,在末尾使用空格进行填充和修剪时,确实有一些怪异的规则,您可能不愿意学习这些规则.对于MySQL,情况可能有所不同,但几乎没有.

For InnoDB, the answer is always: Chose the variable length data type. There is no performance benefit from fixed length data types in InnoDB, ever, but there is a huge size penalty if you are choosing a fixed length data type and then are not using all space in it. Plus fixed length SQL string types have really weird rules regarding padding and trimming with spaces at the end which you probably can't be bothered to learn. For MySQL the case can be different, but almost never is.

另一个相关的问题:我是否应该为字符串(分别为VARBINARY或BLOB)选择VARCHAR或TEXT?

Another related question: Shall I choose VARCHAR or TEXT for my strings (VARBINARY or BLOB, respectively)?

答案是使用InnoDB的最新版本,梭子鱼格式表,然后使用TEXT/BLOB.其原因在 http://中进行了详细说明. www.mysqlperformanceblog.com/2011/04/07/innodb-row-size-limitation/.这样做的结果是:如果梭子鱼之前的格式为VARCHAR或TEXT/BLOB,则如果一行中有太多行,则会冒InnoDB行大小限制的风险.

The answer for that is using a recent version of InnoDB, Barracuda format tables and then TEXT/BLOB. The reason for that is explained at length in http://www.mysqlperformanceblog.com/2011/04/07/innodb-row-size-limitation/. The upshot of that is: With either VARCHAR or TEXT/BLOB in pre-Barracuda format you run the risk of overflowing the InnoDB row size limitation if you have too many of them in a single row.

最后:我应该在数据库中存储文件/图像/其他大块或文本数据吗?

And finally: Shall I store files/images/other large blob or text data in the database?

答案是:通常不会.从数据库提供文件( http://mysqldump.与从文件系统提供文件相比,azundris.com/archives/36-Serving-Images-From-A-Database.html )是一项昂贵的操作.如果可能的话,您可能想要这样做.有一种解决方法, http://www.blobstreaming.org/,但这是先进的技术这就要求您完全控制执行环境,而在托管环境中则绝非如此.

The answer for that is: Usually not. Serving files from a database (http://mysqldump.azundris.com/archives/36-Serving-Images-From-A-Database.html) is an expensive operation compared to serving files from a filesystem. If at all possible, you'd want to do that instead. There is a way around that, http://www.blobstreaming.org/, but that is advanced technology that requires that you have total control of your execution environment, which is never the case in a hosted environment.

要解决此问题,请执行以下操作:MEMORY引擎表中没有可变长度的数据类型.因此,如果您在EXPLAIN输出中看到使用临时",则表示

To round this off: There are no variable length data types in MEMORY engine tables. So if you are seeing "using temporary" in an EXPLAIN output, this means

  • VARCHAR在该临时表中转换为CHAR
  • VARBINARY转换为BINARY

如果通过此过程创建的临时表大于tmp_table_size或max_heap_table_size,则它会即时转换为MyISAM格式并进入磁盘.

If the temporary table by this process becomes larger than tmp_table_size OR max_heap_table_size, it is on the fly converted to MyISAM format and goes to disk.

示例:您正在定义一个Ruby Active Record类User,其中包含十个标记为:string的字段.这些最终都在Users表中成为VARCHAR(255) CHARSET utf8.

Example: You are defining a Ruby Active Record class User with contains ten fields tagged as :string. Each of these end up being VARCHAR(255) CHARSET utf8 in your Users table.

在代码库的其他地方,Users的使用方式涉及计划using temporary.您很快就死于负载下的磁盘操作,因为Users表的每一行现在至少在MEMORY中使用7650字节,其中大多数用作空格.这将迫使临时表转换为MyISAM并写入磁盘.

Elsewhere in your codebase, Users is being used in a way that involves a plan using temporary. You are instantly dying in disk operations under load, because each row of the Users table is now using 7650 bytes in MEMORY at least, most of them being spaces used as padding. This forces the temporary table to be converted to MyISAM and written to disk.

  • 任何%TEXT%或%BLOB%类型都不能在MEMORY中表示,因此临时表将作为MyISAM进入磁盘,即使它已经足够小以根据上述限制保存在内存中.

这意味着需要重写任何具有TEXT或BLOB类型的查询以及具有使用临时"的计划,以避免临时表撞击磁盘.

This implies that any query with a TEXT or BLOB type and a plan with "using temporary" needs to be rewritten in order to avoid temp tables hitting the disk.

这篇关于为MySQL选择数据类型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆