MySQL的情况下utf8_general_ci敏感 [英] mysql case sensitive in utf8_general_ci

查看:595
本文介绍了MySQL的情况下utf8_general_ci敏感的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经在这里我使用utf8_general_ci一个mysql数据库(即不区分大小写),并在我的表我有一些列的ID一样大小写敏感数据(例如:'iSZ6fX'或'AscSc2')

要从小写字母大写分明是更好地在这些列上只有utf8_bin设置,如下所示:

  CREATE TABLE`test`(
`id` VARCHAR(32)CHARACTER SET UTF8 COLLATE utf8_bin NOT NULL,
`value1` VARCHAR(255)字符集UTF8 COLLATE utf8_general_ci NOT NULL
)ENGINE = MYISAM CHARACTER SET UTF8 COLLATE utf8_general_ci

或者使用utf8_general_ci所有列并在PHP查询中使用'BINARY',例如:

  mysqli_query($连接,SELECT * FROM表WHERE ID BINARY ='iSZ6fX');


解决方案

这是更好地使用 utf8_bin 整理,因为即使它是不可能在UTF-8在一般情况下,它在理论上是可能的(例如使用UTF-16发生)的相同的串被重新$ p $通过的不同的编码psented,它的二进制比较不理解,但二进制排序会。正如在统一code字符集:


  

有就是通过人物的code值排序和订购了人物的二进制重新presentation,只有具有出现utf16_bin ,由于代理人的。


  
  

假设 utf16_bin (对于二进制排序 UTF16 )为,而不是二进制比较逐字节 逐个字符。如果是这样的话,在字符utf16_bin 的顺序将从utf8_bin 的顺序不同。例如,下面的图表显示了两个生僻字。第一个字符是范围 E000-FFFF ,所以它比补充大于替代,但较少。第二个字符是一个补充。


code点字UTF8 UTF16
---------- --------- ---- -----
0FF9D半角片假名字母N EF BE 9D FF 9D
10384乌加里特字母Delta F0 90 8E 84 D8 00 DF 84


  
  

图中的两个字符是为了通过code点的值,因为 0xff9d < 0x10384 。他们是为了通过 UTF8 的价值,因为 0xef < 0XF0 。但他们不是为了通过 UTF16 价值,如果我们使用逐字节的比较,因为 0xFF的> 0xd8


  
  

所以MySQL的 utf16_bin 整理不是字节一个字节。这是由code点。当MySQL看到一个补充,字符编码 UTF16 ,将其转换为字符的code点的值,然后进行比较。因此, utf8_bin utf16_bin 是相同的顺序。这与SQL一致性:对于一个UCS_BASIC整理2008标准要求:UCS_BASIC是其中的排序是由字符串中的字符的UNI code标值完全取决于被排序的排序规则。它适用于UCS字符集。由于每个字符集是UCS剧目的一个子集,该UCS_BASIC整理可能是适用于每个字符集。注11:一个字符的UNI code标值是无符号整数对待其code点


因此​​,如果涉及到这些列的比较将的总是的是区分大小写的,你应该设置列的排序规则 utf8_bin (所以他们会保持情况下即使是敏感,如果你忘记在查询另行指定);或者,如果只有特定的查询是区分大小写的,你可以指定使用 COLLATE 关键字的 utf8_bin 整理,应使用

SELECT * FROM表WHERE ID ='iSZ6fX'COLLATE utf8_bin

I've a mysql database where i use utf8_general_ci (that is case insensitive), and in my tables i have some columns like ID with case-sensitive data (example: 'iSZ6fX' or 'AscSc2')

To distinct uppercase from lowercase is better to set on these columns only the utf8_bin, like this:

CREATE TABLE  `test` (
`id` VARCHAR( 32 ) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL ,
`value1` VARCHAR( 255 ) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL
) ENGINE = MYISAM CHARACTER SET utf8 COLLATE utf8_general_ci

Or use utf8_general_ci on all columns and use 'BINARY' in the php query, for example:

mysqli_query( $link, "SELECT * FROM table WHERE BINARY id = 'iSZ6fX'" );

解决方案

It is better to use the utf8_bin collation because, even though it is not possible in UTF-8, in the general case it is theoretically possible (such as happens with UTF-16) for the same string to be represented by different encodings, which a binary comparison would not understand but a binary collation would. As documented under Unicode Character Sets:

There is a difference between "ordering by the character's code value" and "ordering by the character's binary representation," a difference that appears only with utf16_bin, because of surrogates.

Suppose that utf16_bin (the binary collation for utf16) was a binary comparison "byte by byte" rather than "character by character." If that were so, the order of characters in utf16_bin would differ from the order in utf8_bin. For example, the following chart shows two rare characters. The first character is in the range E000-FFFF, so it is greater than a surrogate but less than a supplementary. The second character is a supplementary.

Code point  Character                    utf8         utf16
----------  ---------                    ----         -----
0FF9D       HALFWIDTH KATAKANA LETTER N  EF BE 9D     FF 9D
10384       UGARITIC LETTER DELTA        F0 90 8E 84  D8 00 DF 84

The two characters in the chart are in order by code point value because 0xff9d < 0x10384. And they are in order by utf8 value because 0xef < 0xf0. But they are not in order by utf16 value, if we use byte-by-byte comparison, because 0xff > 0xd8.

So MySQL's utf16_bin collation is not "byte by byte." It is "by code point." When MySQL sees a supplementary-character encoding in utf16, it converts to the character's code-point value, and then compares. Therefore, utf8_bin and utf16_bin are the same ordering. This is consistent with the SQL:2008 standard requirement for a UCS_BASIC collation: "UCS_BASIC is a collation in which the ordering is determined entirely by the Unicode scalar values of the characters in the strings being sorted. It is applicable to the UCS character repertoire. Since every character repertoire is a subset of the UCS repertoire, the UCS_BASIC collation is potentially applicable to every character set. NOTE 11: The Unicode scalar value of a character is its code point treated as an unsigned integer."

Therefore, if comparisons involving these columns will always be case-sensitive, you should set the column's collation to utf8_bin (so that they will remain case sensitive even if you forget to specify otherwise in your query); or if only particular queries are case-sensitive, you could specify that the utf8_bin collation should be used using the COLLATE keyword:

SELECT * FROM table WHERE id = 'iSZ6fX' COLLATE utf8_bin

这篇关于MySQL的情况下utf8_general_ci敏感的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆