MySQL 字符编码更改.是否保留了数据完整性? [英] MySQL character encoding change. Is data integrity preserved?

查看:50
本文介绍了MySQL 字符编码更改.是否保留了数据完整性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须将数据库编码从 latin-1 转换为 utf-8.

I will have to convert the database encoding from latin-1 to utf-8.

我知道转换数据库是通过

I'm aware of the fact that converting the database is done via the command of

ALTER DATABASE db_name
    [[DEFAULT] CHARACTER SET charset_name]
    [[DEFAULT] COLLATE collation_name]

Source 并转换现有表是通过

ALTER TABLE tbl_name
    [[DEFAULT] CHARACTER SET charset_name]
    [COLLATE collation_name]

来源.

但是,数据库已经存在,并且涉及敏感信息.我的问题是我已经拥有的数据是否会被更改.这个问题的目的是为了让我在做改动之前先给个估计.

However, the database is already existent and there is sensitive information involved. My question is whether the data I already have will be changed. The purpose of this question is that I have to give an estimate before I do the change.

推荐答案

每个(字符串类型)都有其自己的字符集和归类元数据.

Every (character string-type) column has its own character set and collation metadata.

如果在指定的数据类型时(即上次创建或更改时),没有明确给出字符集/排序规则,则表的默认字符集和排序规则将用于列.

If, when the column's data type was specified (i.e. when it was last created or altered), no character set/collation was explicitly given, then the table's default character set and collation would be used for the column.

如果在指定 table 时,没有明确给出默认字符集/排序规则,那么数据库的默认字符集和排序规则将用于表的默认值.

If, when the table was specified, no default character set/collation was explicitly given, then the database's default character set and collation would be used for the table's default.

您在问题中引用的命令仅分别更改数据库和表的默认字符集/排序规则.换句话说,它们只会影响之后创建的表和列——它们不会影响现有的列(或数据).

The commands that you quote in your question merely alter such default character sets/collations for the database and table respectively. In other words, they will only affect tables and columns that are created thereafter—they will not affect existing columns (or data).

要更新现有数据,您应该首先阅读 ALTER TABLE 手册页的更改字符集部分:

To update existing data, you should first read the Changing the Character Set section of the manual page on ALTER TABLE:

更改表默认字符集和所有字符列(CHAR, VARCHAR, TEXT) 到新的字符集,使用这样的语句:

Changing the Character Set

To change the table default character set and all character columns (CHAR, VARCHAR, TEXT) to a new character set, use a statement like this:

ALTER TABLE tbl_name CONVERT TO CHARACTER SET charset_name;

该语句还更改了所有字符列的排序规则.如果未指定 COLLATE 子句来指示要使用的排序规则,则该语句将使用字符集的默认排序规则.如果此归类不适合预期的表用途(例如,如果它将从区分大小写的归类更改为不区分大小写的归类),请明确指定归类.

The statement also changes the collation of all character columns. If you specify no COLLATE clause to indicate which collation to use, the statement uses default collation for the character set. If this collation is inappropriate for the intended table use (for example, if it would change from a case-sensitive collation to a case-insensitive collation), specify a collation explicitly.

对于数据类型为 VARCHAR 的列TEXT 之一类型,CONVERT TO CHARACTER SET 根据需要更改数据类型,以确保新列的长度足以存储与原始列一样多的字符.例如,TEXT 列有两个长度字节,用于存储列中值的字节长度,最大为 65,535.对于 latin1 TEXT 列,每个字符需要一个字节,因此该列最多可以存储 65,535 个字符.如果将列转换为 utf8,则每个字符可能需要最多三个字节,最大可能长度为 3 × 65,535 = 196,605 个字节.该长度不适合 TEXT 列的长度字节,因此 MySQL 将数据类型转换为 MEDIUMTEXT,它是长度字节可以记录值为 196,605 的最小字符串类型.同样,VARCHAR 列可能会转换为MEDIUMTEXT.

For a column that has a data type of VARCHAR or one of the TEXT types, CONVERT TO CHARACTER SET changes the data type as necessary to ensure that the new column is long enough to store as many characters as the original column. For example, a TEXT column has two length bytes, which store the byte-length of values in the column, up to a maximum of 65,535. For a latin1 TEXT column, each character requires a single byte, so the column can store up to 65,535 characters. If the column is converted to utf8, each character might require up to three bytes, for a maximum possible length of 3 × 65,535 = 196,605 bytes. That length does not fit in a TEXT column's length bytes, so MySQL converts the data type to MEDIUMTEXT, which is the smallest string type for which the length bytes can record a value of 196,605. Similarly, a VARCHAR column might be converted to MEDIUMTEXT.

为了避免刚刚描述的类型的数据类型更改,不要使用CONVERT TO CHARACTER SET.相反,使用 MODIFY 来更改单个列.例如:

To avoid data type changes of the type just described, do not use CONVERT TO CHARACTER SET. Instead, use MODIFY to change individual columns. For example:

ALTER TABLE t MODIFY latin1_text_col TEXT CHARACTER SET utf8;
ALTER TABLE t MODIFY latin1_varchar_col VARCHAR(M) CHARACTER SET utf8;

如果您指定CONVERT TO CHARACTER SET binaryCHARVARCHARTEXT 列被转换为其相应的二进制字符串类型(BINARY, VARBINARY, BLOB).这意味着列将不再具有字符集属性,并且后续的 CONVERT TO 操作将不适用于它们.

If you specify CONVERT TO CHARACTER SET binary, the CHAR, VARCHAR, and TEXT columns are converted to their corresponding binary string types (BINARY, VARBINARY, BLOB). This means that the columns no longer will have a character set attribute and a subsequent CONVERT TO operation will not apply to them.

如果在 CONVERT TO CHARACTER SET 操作中 charset_nameDEFAULT,由 character_set_database 命名的字符集 使用了系统变量.

If charset_name is DEFAULT in a CONVERT TO CHARACTER SET operation, the character set named by the character_set_database system variable is used.

CONVERT TO 操作在原始字符集和命名字符集之间转换列值.如果您在一个字符集中有一列(例如 latin1),但存储的值实际上使用了其他一些不兼容的字符集(例如 utf8).在这种情况下,您必须对每个此类列执行以下操作:

 Warning

The CONVERT TO operation converts column values between the original and named character sets. This is not what you want if you have a column in one character set (like latin1) but the stored values actually use some other, incompatible character set (like utf8). In this case, you have to do the following for each such column:

ALTER TABLE t1 CHANGE c1 c1 BLOB;
ALTER TABLE t1 CHANGE c1 c1 TEXT CHARACTER SET utf8;

这样做的原因是当您与 <代码>BLOB 列.

The reason this works is that there is no conversion when you convert to or from BLOB columns.

要仅更改表的默认字符集,请使用以下语句:

To change only the default character set for a table, use this statement:

ALTER TABLE tbl_name DEFAULT CHARACTER SET charset_name;

DEFAULT 这个词是可选的.默认字符集是在您没有为稍后添加到表中的列指定字符集时使用的字符集(例如,使用 ALTER TABLE ... ADD column).

The word DEFAULT is optional. The default character set is the character set that is used if you do not specify the character set for columns that you add to a table later (for example, with ALTER TABLE ... ADD column).

foreign_key_checks 系统变量已启用,这是默认设置,在包含外键约束中使用的字符串列的表上不允许字符集转换.解决方法是禁用 foreign_key_checks 在执行字符集转换之前.在重新启用 foreign_key_checks.如果您重新启用 foreign_key_checks 仅转换其中一个表后,ON DELETE CASCADEON UPDATE CASCADE 操作可能会由于这些操作期间发生的隐式转换而损坏引用表中的数据(错误 #45290,错误 #74816).

When the foreign_key_checks system variable is enabled, which is the default setting, character set conversion is not permitted on tables that include a character string column used in a foreign key constraint. The workaround is to disable foreign_key_checks before performing the character set conversion. You must perform the conversion on both tables involved in the foreign key constraint before re-enabling foreign_key_checks. If you re-enable foreign_key_checks after converting only one of the tables, an ON DELETE CASCADE or ON UPDATE CASCADE operation could corrupt data in the referencing table due to implicit conversion that occurs during these operations (Bug #45290, Bug #74816).

这篇关于MySQL 字符编码更改.是否保留了数据完整性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆