MySQL的情况下utf8_general_ci敏感 [英] mysql case sensitive in utf8_general_ci
问题描述
我已经在这里我使用utf8_general_ci一个mysql数据库(即不区分大小写),并在我的表我有一些列的ID一样大小写敏感数据(例如:'iSZ6fX'或'AscSc2')
要从小写字母大写分明是更好地在这些列上只有utf8_bin设置,如下所示:
CREATE TABLE`test`(
`id` VARCHAR(32)CHARACTER SET UTF8 COLLATE utf8_bin NOT NULL,
`value1` VARCHAR(255)字符集UTF8 COLLATE utf8_general_ci NOT NULL
)ENGINE = MYISAM CHARACTER SET UTF8 COLLATE utf8_general_ci
或者使用utf8_general_ci所有列并在PHP查询中使用'BINARY',例如:
mysqli_query($连接,SELECT * FROM表WHERE ID BINARY ='iSZ6fX');
这是更好地使用 utf8_bin
整理,因为即使它是不可能在UTF-8在一般情况下,它在理论上是可能的(例如使用UTF-16发生)的相同的串被重新$ p $通过的不同的编码psented,它的二进制比较不理解,但二进制排序会。正如在统一code字符集:
有就是通过人物的code值排序和订购了人物的二进制重新presentation,只有具有
出现utf16_bin $ C的差异之间的差异$ C>,由于代理人的。
假设
utf16_bin
(对于二进制排序UTF16
)为,而不是二进制比较逐字节 逐个字符。如果是这样的话,在字符utf16_bin
的顺序将从utf8_bin 在的顺序不同。例如,下面的图表显示了两个生僻字。第一个字符是范围
E000-FFFF
,所以它比补充大于替代,但较少。第二个字符是一个补充。
code点字UTF8 UTF16
---------- --------- ---- -----
0FF9D半角片假名字母N EF BE 9D FF 9D
10384乌加里特字母Delta F0 90 8E 84 D8 00 DF 84
图中的两个字符是为了通过code点的值,因为
0xff9d
<0x10384
。他们是为了通过UTF8
的价值,因为0xef
<0XF0 code>。但他们不是为了通过
UTF16
价值,如果我们使用逐字节的比较,因为0xFF的
>0xd8
。
所以MySQL的
utf16_bin
整理不是字节一个字节。这是由code点。当MySQL看到一个补充,字符编码UTF16
,将其转换为字符的code点的值,然后进行比较。因此,utf8_bin
和utf16_bin
是相同的顺序。这与SQL一致性:对于一个UCS_BASIC整理2008标准要求:UCS_BASIC是其中的排序是由字符串中的字符的UNI code标值完全取决于被排序的排序规则。它适用于UCS字符集。由于每个字符集是UCS剧目的一个子集,该UCS_BASIC整理可能是适用于每个字符集。注11:一个字符的UNI code标值是无符号整数对待其code点
块引用>因此,如果涉及到这些列的比较将的总是的是区分大小写的,你应该设置列的排序规则
utf8_bin
(所以他们会保持情况下即使是敏感,如果你忘记在查询另行指定);或者,如果只有特定的查询是区分大小写的,你可以指定使用COLLATE
关键字的utf8_bin
整理,应使用SELECT * FROM表WHERE ID ='iSZ6fX'COLLATE utf8_bin
I've a mysql database where i use utf8_general_ci (that is case insensitive), and in my tables i have some columns like ID with case-sensitive data (example: 'iSZ6fX' or 'AscSc2')
To distinct uppercase from lowercase is better to set on these columns only the utf8_bin, like this:
CREATE TABLE `test` ( `id` VARCHAR( 32 ) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL , `value1` VARCHAR( 255 ) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL ) ENGINE = MYISAM CHARACTER SET utf8 COLLATE utf8_general_ci
Or use utf8_general_ci on all columns and use 'BINARY' in the php query, for example:
mysqli_query( $link, "SELECT * FROM table WHERE BINARY id = 'iSZ6fX'" );
解决方案It is better to use the
utf8_bin
collation because, even though it is not possible in UTF-8, in the general case it is theoretically possible (such as happens with UTF-16) for the same string to be represented by different encodings, which a binary comparison would not understand but a binary collation would. As documented under Unicode Character Sets:There is a difference between "ordering by the character's code value" and "ordering by the character's binary representation," a difference that appears only with
utf16_bin
, because of surrogates.Suppose that
utf16_bin
(the binary collation forutf16
) was a binary comparison "byte by byte" rather than "character by character." If that were so, the order of characters inutf16_bin
would differ from the order inutf8_bin
. For example, the following chart shows two rare characters. The first character is in the rangeE000-FFFF
, so it is greater than a surrogate but less than a supplementary. The second character is a supplementary.Code point Character utf8 utf16 ---------- --------- ---- ----- 0FF9D HALFWIDTH KATAKANA LETTER N EF BE 9D FF 9D 10384 UGARITIC LETTER DELTA F0 90 8E 84 D8 00 DF 84The two characters in the chart are in order by code point value because
0xff9d
<0x10384
. And they are in order byutf8
value because0xef
<0xf0
. But they are not in order byutf16
value, if we use byte-by-byte comparison, because0xff
>0xd8
.So MySQL's
utf16_bin
collation is not "byte by byte." It is "by code point." When MySQL sees a supplementary-character encoding inutf16
, it converts to the character's code-point value, and then compares. Therefore,utf8_bin
andutf16_bin
are the same ordering. This is consistent with the SQL:2008 standard requirement for a UCS_BASIC collation: "UCS_BASIC is a collation in which the ordering is determined entirely by the Unicode scalar values of the characters in the strings being sorted. It is applicable to the UCS character repertoire. Since every character repertoire is a subset of the UCS repertoire, the UCS_BASIC collation is potentially applicable to every character set. NOTE 11: The Unicode scalar value of a character is its code point treated as an unsigned integer."Therefore, if comparisons involving these columns will always be case-sensitive, you should set the column's collation to
utf8_bin
(so that they will remain case sensitive even if you forget to specify otherwise in your query); or if only particular queries are case-sensitive, you could specify that theutf8_bin
collation should be used using theCOLLATE
keyword:SELECT * FROM table WHERE id = 'iSZ6fX' COLLATE utf8_bin
这篇关于MySQL的情况下utf8_general_ci敏感的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!