关于Unicode和排序规则,如何使MySQL像SQLite一样处理字符串? [英] How to make MySQL handle strings like SQLite does, with regard to Unicode and collation?
问题描述
我已经在SO,MySQL文档和其他地方研究了这个问题几个小时,但仍然找不到令人满意的解决方案。问题是:
I've been researching this question for several hours now, on SO, in MySQL docs, and elsewhere, but still can't find a satisfactory solution. The problem is:
让MySQL像SQLite一样对待字符串而没有任何额外的智能转换的最简单方法是什么?
例如,以下代码在SQLite中可完美运行:
For example, the following works perfectly in SQLite:
CREATE TABLE `dummy` (`key` VARCHAR(255) NOT NULL UNIQUE);
INSERT INTO `dummy` (`key`) VALUES ('one');
INSERT INTO `dummy` (`key`) VALUES ('one ');
INSERT INTO `dummy` (`key`) VALUES ('One');
INSERT INTO `dummy` (`key`) VALUES ('öne');
SELECT * FROM `dummy`;
但是,在MySQL中,具有以下设置:
However, in MySQL, with the following settings:
[client]
default-character-set = utf8mb4
[mysql]
default-character-set = utf8mb4
[mysqld]
character-set-client-handshake = FALSE
character-set-server = utf8mb4
collation-server = utf8mb4_bin
和以下 CREATE DATABASE
语句:
CREATE DATABASE `dummydb` DEFAULT CHARACTER SET utf8mb4 DEFAULT COLLATE utf8mb4_bin;
在第二个 INSERT
上仍然失败。
it still fails on the second INSERT
.
我宁愿使字符串列声明尽可能简单,SQLite的 TEXT
是理想的选择。 看起来很像 VARBINARY
,但我仍然想听听您对其他的看法更好 选项。
I'd rather keep string column declarations as simple as possible, SQLite's TEXT
being the ideal. Looks like VARBINARY
is the way to go, but I would still like to hear your opinions on any other, potentially better options.
附录 >: SHOW CREATE TABLE虚拟
输出是
mysql> SHOW CREATE TABLE dummy;
+-------+-----------------------------------------------------
| Table | Create Table
+-------+-----------------------------------------------------
| dummy | CREATE TABLE `dummy` (
`key` varchar(255) COLLATE utf8mb4_bin NOT NULL,
UNIQUE KEY `key` (`key`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin |
+-------+-----------------------------------------------------
1 row in set (0.00 sec)
推荐答案
MySQL希望在执行 INSERT
和 SELECT $ c时转换字符串$ c>。转换是在您声明 client 所拥有的内容与声明 column 所要存储的内容之间。
MySQL wants to convert strings when doing INSERT
and SELECT
. The conversion is between what you declare the client to have and what the column is declared to be storing.
避免这种情况的唯一方法是使用 VARBINARY
和 BLOB
而不是 VARCHAR
和 TEXT
。
The only way to avoid that is with VARBINARY
and BLOB
instead of VARCHAR
and TEXT
.
使用 COLLATION utf8mb4_bin
不能避免与 CHARACTER SET utf8mb4
之间的转换;它只是说 WHERE
和 ORDER BY
应该比较位而不是处理重音和大小写折叠。
The use of COLLATION utf8mb4_bin
does not avoid conversion to/from CHARACTER SET utf8mb4
; it merely says that WHERE
and ORDER BY
should compare the bits instead of dealing with accents and case folding.
请记住,字符集utf8mb4
是一种编码文本的方式; COLLATION utf8mb4 _ *
是用于比较该编码文本的规则。 _bin
很简单。
Keep in mind that CHARACTER SET utf8mb4
is a way to encode text; COLLATION utf8mb4_*
is rules for comparing texts in that encoding. _bin
is simpleminded.
UNIQUE
涉及比较是否相等,因此 COLLATION
。在大多数utf8mb4归类中,这3个(不带空格)将比较相等。 utf8mb4_bin
会将这3个视为不同。 utf8mb4_hungarian_ci
对待one = One>öne。
UNIQUE
involves comparing for equality, hence COLLATION
. In most utf8mb4 collations, the 3 (without spaces) will compare equal. utf8mb4_bin
will treat the 3 as different. utf8mb4_hungarian_ci
treats one=One>öne.
尾随空格由列的数据类型控制( VARCHAR
或其他)。最新版本甚至具有关于是否考虑尾随空格的设置。
The trailing spaces are controlled by the datatype of the column (VARCHAR
or other). The latest version even has a setting relating to whether to consider trailing spaces.
这篇关于关于Unicode和排序规则,如何使MySQL像SQLite一样处理字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!