MySQL:转换数据类型和排序规则对存储的数据的影响 [英] MySQL: Converting datatypes and collations effect on stored data

查看:118
本文介绍了MySQL:转换数据类型和排序规则对存储的数据的影响的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对此有一个一般性的问题.很多时候,我们想更改字段或排序规则的数据类型,在之前插入大量数据时.考虑以下情况:

I have a general question about this. There are many times we want to change data-types of fields or collations when lots of data is inserted before. Consider these situations :

  1. varchar归类从utf8_general_ci转换为latin1_swedish_ci:据我所知,第一个具有多字节字符,第二个具有单字节字符.此转换是否正确处理存储的记录?并且这种转换是否会导致现有数据量的减少(可能是50%)?

  1. converting varchar collation from utf8_general_ci to latin1_swedish_ci: as I know the first has multibyte chars and the second singly byte ones. Does this conversion manipulate stored records correctly? And does this conversion lead to reduction of volume of existing data (maybe 50%)?

int(10)smallint(5)的转换:数据量是否正确减少到50%?

Conversion of int(10) to smallint(5): Does the volume of data reduce to 50% correctly?

或者例如:int(10)unsigned int(10)-textvarchar(1000)-varchar(20)char(10),...

Or for example: int(10) to unsigned int(10) - text to varchar(1000) - varchar(20) to char(10) , ...

很明显,可以采取这些措施来提高效率,减少数据量并...

As it is obvious, these actions might be done to increase efficiency, reduce volume of data and ...

考虑一下,我有一个包含1,000,000条记录的表.我想知道执行这样的操作是否会对存储的数据造成不良影响,或者是否会使涉及此表的将来插入和选择的性能降低.

Consider I have a table with 1,000,000 records. I want to know if doing such actions have bad effects on stored data, or if it makes low performance for future inserts and selects involving this table.

更新:
当我谈论将utf8编码字符集更改为拉丁语时,当然,我的字段的值是英语(很明显,如果有日语,它们将会丢失).以此假设为前提,我要问的是结果表的大小和性能.

UPDATE :
When I talk about changing utf8 encoding charset to Latin, of course the values of my field are English (it's obvious if there are Japanese, they will be lost). With this assumption, I'm asking about the resulting table size and performance.

推荐答案

varchar排序规则从utf8_general_ci转换为latin1_swedish_ci:据我所知,第一个具有多字节字符,第二个具有单字节字符.此转换是否正确处理存储的记录?转换是否会导致现有数据量减少(也许是50%)?

Converting varchar collation from utf8_general_ci to latin1_swedish_ci: As I know the first has multibyte chars and the second singly byte ones. Does this conversion manipulate stored records correctly? And does this conversion lead to reduction of volume of existing data (maybe 50%)?

整理只是用于字符串比较的顺序-与(几乎)与用于数据存储的字符编码无关.我说几乎是因为归类只能用于某些字符集,因此更改归类 可能会强制更改字符编码.

Collation is merely the ordering that is used for string comparisons—it has (almost) nothing to do with the character encoding that is used for data storage. I say almost because collations can only be used with certain character sets, so changing collation may force a change in the character encoding.

只要修改了字符编码,MySQL就会正确地将值重新编码为新的字符集,无论是从单字节变为多字节,反之亦然.请注意,对于列而言太大的任何值都将被截断.

To the extent that the character encoding is modified, MySQL will correctly re-encode values to the new character set whether going from single to multi-byte or vice-versa. Beware that any values that become too large for the column will be truncated.

假设新字符类型具有可变长度,并且在新编码中使用比以前更少的字节来编码值,那么表的大小当然会减小.

Provided that the new character type is of variable-length and that the values are encoded with fewer bytes in the new encoding than before, there will of course be a reduction in the table's size.

int(10)smallint(5)的转换:数据量是否正确减少到50%?

Conversion of int(10) to smallint(5): Does the volume of data reduce to 50% correctly?

INTSMALLINT分别占用4和2个字节,而不管显示宽度如何:因此,表的大小将相应减小.

INT and SMALLINT respectively occupy 4 and 2 bytes regardless of display width: so yes, the size of the table will reduce accordingly.

或者例如:int(10)unsigned int(10)-textvarchar(1000)-varchar(20)char(10),...

Or for example: int(10) to unsigned int(10) - text to varchar(1000) - varchar(20) to char(10), ...

  • INT占用4个字节,无论它是否已签名,因此都不会发生变化;

    • INT occupies 4 bytes irrespective of whether it is signed, so there will be no change;

      TEXTVARCHAR(1000)都占用 L +2个字节(其中 L 是值的长度,以字节为单位),因此不会发生变化;

      TEXT and VARCHAR(1000) both occupy L+2 bytes (where L is the value's length in bytes), so there will be no change;

      VARCHAR(20)占用 L +1个字节(其中 L 是该值的长度,以字节为单位),而CHAR(10)则占用10倍, w 字节(其中 w 是字符集中最大长度字符所需的字节数),因此可能会有所变化,但它取决于存储的实际值和使用的字符编码.

      VARCHAR(20) occupies L+1 bytes (where L is the value's length in bytes) whereas CHAR(10) occupies 10×w bytes (where w is the number of bytes required for the maximum-length character in the character set), so there may well be a change but it is dependent on the actual values stored and the character encoding used.

      请注意,取决于存储引擎,表大小的减小可能不会立即释放到文件系统中.

      Note that, depending on storage engine, reductions in table size may not immediately be released to the filesystem.

      这篇关于MySQL:转换数据类型和排序规则对存储的数据的影响的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆