MySQL非法排序规则混合 [英] MySQL Illegal mix of collations

查看:98
本文介绍了MySQL非法排序规则混合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

查看我的产品日志后,我提到了一些错误:

After viewing my prod logs, I have some error mentionning :

[2012-08-31 15:56:43] request.CRITICAL: Doctrine\DBAL\DBALException: 
An exception occurred while executing 'SELECT t0.username ....... FROM fos_user t0 WHERE t0.username = ?'
with params {"1":"Nrv\u29e7Kasi"}:

SQLSTATE[HY000]: General error: 1267 Illegal mix of collations (latin1_swedish_ci,IMPLICIT)
and (utf8_general_ci,COERCIBLE) for operation '=' 

Alghout我在cfg教义下默认使用UTF-8:

Alghout i have UTF-8 default under the doctrine cfg :

doctrine:
    dbal:
        charset:  UTF8

似乎我所有的MySQL表都在latin1_swedish_ci中,所以我的问题是:

It seems that all my MySQL Tables are in latin1_swedish_ci, so my question is :

我是否可以将所有表的排序规则手动更改为utf8_general_ci而没有任何复杂性/注意事项?

Can I manually change the collation to utf8_general_ci for all my tables without any complications/precautions ?

推荐答案

了解以下定义会有所帮助:

It is helpful to understand the following definitions:

  • 字符编码详细说明了每个符号如何用二进制表示(并因此存储在计算机中).例如,符号é(U + 00E9,带小写字母E的拉丁文小写字母E)是已编码作为 UTF-8 (MySQL称为utf8)和0xe9中的0xc3a9 Windows-1252 (MySQL称为latin1)中.

  • A character encoding details how each symbol is represented in binary (and therefore stored in the computer). For example, the symbol é (U+00E9, latin small letter E with acute) is encoded as 0xc3a9 in UTF-8 (which MySQL calls utf8) and 0xe9 in Windows-1252 (which MySQL calls latin1).

字符集是可以使用给定字符编码表示的符号字母.令人困惑的是,该术语也与字符编码的含义相同.

A character set is the alphabet of symbols that can be represented using a given character encoding. Confusingly, the term is also used to mean the same as character encoding.

归类是对字符集的排序,因此可以比较字符串.例如:MySQL的 latin1_swedish_ci 归类将字符的重音变体视为与基数等效字符,而其 latin1_general_ci 排序规则将在下一个基本字符之前对它们进行排序,但不等价(还有其他更重要的区别:例如åäöß之类的字符顺序.

A collation is an ordering on a character set, so that strings can be compared. For example: MySQL's latin1_swedish_ci collation treats most accented variations of a character as equivalent to the base character, whereas its latin1_general_ci collation will order them before the next base character but not equivalent (there are other, more significant, differences too: such as the order of characters like å, ä, ö and ß).

MySQL将决定应将哪种排序规则应用于给定表达式,如表达式排序:尤其是列的排序规则要优先于字符串文字的排序规则.

MySQL will decide which collation should be applied to a given expression as documented under Collation of Expressions: in particular, the collation of a column takes precedence over that of a string literal.

查询的WHERE子句比较以下字符串:

The WHERE clause of your query compares the following strings:

  1. fos_user.username中的值,以列的字符集编码(Windows-1252),并表示对其排序规则latin1_swedish_ci的偏爱(矫顽力值为2);与

  1. a value in fos_user.username, encoded in the column's character set (Windows-1252) and expressing a preference for its collation latin1_swedish_ci (with a coercibility value of 2); with

字符串文字'Nrv⧧Kasi',用连接的字符集(UTF-8,由Doctrine配置)编码,并表示对连接的排序规则utf8_general_ci的偏爱(强制性值为4). /p>

the string literal 'Nrv⧧Kasi', encoded in the connection's character set (UTF-8, as configured by Doctrine) and expressing a preference for the connection's collation utf8_general_ci (with a coercibility value of 4).

由于这些字符串中的第一个比第二个具有较低的矫顽力值,因此MySQL尝试使用该字符串的排序规则latin1_swedish_ci进行比较.为此,MySQL尝试将第二个字符串转换为latin1&mdash ;,但由于该字符集中不存在字符,因此比较失败.

Since the first of these strings has a lower coercibility value than the second, MySQL attempts to perform the comparison using that string's collation: latin1_swedish_ci. To do so, MySQL attempts to convert the second string to latin1—but since the character does not exist in that character set, the comparison fails.

请稍等片刻,以考虑当前列的编码方式:您正在尝试过滤fos_user.username等于包含不能的字符的字符串的记录该列中存在

One should pause for a moment to consider how the column is currently encoded: you are attempting to filter for records where fos_user.username is equal to a string that contains a character which cannot exist in that column!

如果您认为列确实包含此类字符,那么您可能在连接字符编码设置为某种值(例如latin1)时使该列写入,从而导致MySQL解释接收到的字符.字节序列作为Windows-1252字符集中的所有字符.

If you believe that the column does contain such characters, then you probably wrote to the column whilst the connection character encoding was set to something (e.g. latin1) that caused MySQL to interpret the received byte sequence as characters which are all in the Windows-1252 character set.

如果是这种情况,则在继续操作之前,您应该先修复数据!

If this is the case, before continuing any further you should fix your data!

  1. 将此类列转换为数据插入时使用的字符编码(如果与现有编码不同):

  1. convert such columns to the character encoding that was used on data insertion, if different to the incumbent encoding:

ALTER TABLE fos_users MODIFY username VARCHAR(123) CHARACTER SET foo;

  • 通过将与此类列关联的编码信息转换为binary字符集来删除它们:

  • drop the encoding information associated with such columns by converting them to the binary character set:

    ALTER TABLE fos_users MODIFY username VARCHAR(123) CHARACTER SET binary;
    

  • 通过将这些列转换为相关字符集来与实际传输数据的编码相关联.

  • associate with such columns the encoding in which data was actually transmitted by converting them to the relevant character set.

    ALTER TABLE fos_users MODIFY username VARCHAR(123) CHARACTER SET bar;
    

  • 请注意,如果从多字节编码转换,则可能需要增加列的大小(甚至更改其类型),以适应转换后的字符串的最大可能长度.

    Note that, if converting from a multi-byte encoding, you may need to increase the size of the column (or even change its type) in order to accomodate the maximum possible length of the converted string.

    一旦确定列已正确编码,要么可以强制使用Unicode归类进行比较?

    Once one is certain that the columns are correctly encoded, one could force the comparison to be conducted using a Unicode collation by either—

    • 明确地将值fos_user.username转换为Unicode字符集:

    • explicitly converting the value fos_user.username to a Unicode character set:

    WHERE CONVERT(fos_user.username USING utf8) = ?
    

  • 强制字符串文字具有比列低的强制性值(将导致列的值隐式转换为UTF-8):

  • forcing the string literal to have a lower coercibility value than the column (will cause an implicit conversion of the column's value to UTF-8):

    WHERE fos_user.username = ? COLLATE utf8_general_ci
    

  • 或者您可以说将列永久转换为Unicode编码并适当设置其排序规则.

    Or one could, as you say, permanently convert the column(s) to a Unicode encoding and set its collation appropriately.

    我是否可以为所有表手动将排序规则更改为utf8_general_ci而没有任何复杂性/注意事项?

    Can I manually change the collation to utf8_general_ci for all my tables without any complications/precautions ?

    主要考虑因素是Unicode编码比单字节字符集占用更多空间,因此:

    The principle consideration is that Unicode encodings take up more space than single-byte character sets, so:

    • 可能需要更多存储空间;

    • more storage may be required;

    比较可能会慢一些;和

    索引前缀长度可能需要调整(请注意,最大值以字节为单位,因此可能表示的字符数少于以前).

    index prefix lengths may need to be adjusted (note that the maximum is in bytes, so may represent fewer characters than previously).

    此外,请注意,如 ALTER TABLE语法中所述:

    Also, be aware that, as documented under ALTER TABLE Syntax:

    要更改表的默认字符集和所有字符列( CHAR VARCHAR

    To change the table default character set and all character columns (CHAR, VARCHAR, TEXT) to a new character set, use a statement like this:

    ALTER TABLE tbl_name CONVERT TO CHARACTER SET charset_name;

    对于数据类型为 VARCHAR TEXT 类型,CONVERT TO CHARACTER SET将根据需要更改数据类型,以确保新列足够长,足以存储与原始列一样多的字符.例如, TEXT 列有两个长度字节,用于存储值的字节长度.列,最多65,535.对于latin1 TEXT 列,每个字符都需要一个字节,因此该列可以存储最多65,535个字符.如果将列转换为utf8,则每个字符可能最多需要三个字节,最大可能长度为3×65,535 = 196,605字节.该长度不适合 TEXT 列的长度字节,因此MySQL会将数据类型转换为 MEDIUMTEXT ,这是最小的字符串类型,其长度字节可以记录196,605的值.同样, VARCHAR 列可能会转换为

    For a column that has a data type of VARCHAR or one of the TEXT types, CONVERT TO CHARACTER SET will change the data type as necessary to ensure that the new column is long enough to store as many characters as the original column. For example, a TEXT column has two length bytes, which store the byte-length of values in the column, up to a maximum of 65,535. For a latin1 TEXT column, each character requires a single byte, so the column can store up to 65,535 characters. If the column is converted to utf8, each character might require up to three bytes, for a maximum possible length of 3 × 65,535 = 196,605 bytes. That length will not fit in a TEXT column's length bytes, so MySQL will convert the data type to MEDIUMTEXT, which is the smallest string type for which the length bytes can record a value of 196,605. Similarly, a VARCHAR column might be converted to MEDIUMTEXT.

    为避免数据类型更改为刚刚描述的类型,请不要使用CONVERT TO CHARACTER SET.而是使用MODIFY更改各个列.

    To avoid data type changes of the type just described, do not use CONVERT TO CHARACTER SET. Instead, use MODIFY to change individual columns.

    这篇关于MySQL非法排序规则混合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆