在MySQL 5.6中将整数与显式排序规则进行比较时如何避免无效错误 [英] How to avoid invalid errors when comparing integers with explicit collation in MySQL 5.6
问题描述
好吧,这不是有史以来最清晰的标题;随时改善.
Okay, not the clearest title ever; feel free to improve.
我有一张表格,代表数千种语言形式.其中许多都大量使用变音符号,因此 aha , áha̱ 和 ā̧́ḫà̀ 都可能出现.该表(和数据库)使用UTF-8作为字符集,并使用 utf8mb4_unicode_520_ci
作为默认归类方案,因为搜索应区分大小写和变音符(因此搜索 aha 应该调出所有三个).这些表格都是由人工手动输入的,因此不可避免地会有重复.
I have a table representing thousands of linguistic forms. Many of these make heavy use of diacritics, so all of aha, áha̱ and ā̧́ḫà̀ may appear. The table (and the database) uses UTF-8 as character set and utf8mb4_unicode_520_ci
as the default collation scheme, since searching should be case- and diacritic-agnostic (so searching for aha should bring up all three). These forms have all been entered in manually by human beings, though, so there are inevitably duplicates.
我目前正在尝试获取完全相同形式的列表,以消除重复(手动-每个令牌都必须在删除之前进行检查),但是在这种情况下,我需要在变音符号中进行搜索-感知方式–也就是说,鉴于上面列出的三个标记,由于变音符号,它们是三种不同的形式,因此我希望搜索不会产生任何结果.
I’m currently trying to get a list of exactly identical forms in order to get rid of duplicates (manually – each token would have to be checked before being removed), but in this case I need to search in a diacritic-aware manner – that is, given the three tokens listed above, I would expect a search to yield no results, since they are three different forms because of the diacritics.
我认为这应该是一个相当容易的任务;只是做:
I figured this should be a fairly easy task; just do:
SELECT token FROM table GROUP BY token HAVING COUNT(token) > 1 COLLATE utf8mb4_bin
但是a,那行不通.相反,它给我一个错误消息,即"COLLATION utf8mb4_bin对字符集latin1无效".我应该注意到,我在任何地方都没有拉丁文-1,没有字符集,没有排序规则,没有服务器字符集,也没有.也没有存储过程或其他任何可能会出现Latin-1的内容.
But alas, that does not work. Instead, it gives me an error message that "COLLATION utf8mb4_bin is not valid for CHARACTER SET latin1". I should note that I have absolutely nothing Latin-1 anywhere – no character sets, no collations, no server charsets, nothing. There are also no stored procedures or anything else where Latin-1 might creep in.
不,这是因为此错误,该错误已修复从5.7开始;请参阅底部的说明:
No, this is because of this bug, which is apparently fixed from 5.7 onwards; see the description at the bottom:
对于诸如ORDER BY numeric_expr COLLATE collation_name的构造,该表达式的字符集被视为latin1,如果COLLATE之后指定的排序规则与latin1不兼容,则会导致错误.现在,当在存在COLLATE的情况下将数字表达式隐式转换为字符表达式时,使用的字符集就是与命名排序规则关联的字符集.
For constructs such as ORDER BY numeric_expr COLLATE collation_name, the character set of the expression was treated as latin1, which resulted in an error if the collation specified after COLLATE is incompatible with latin1. Now when a numeric expression is implicitly cast to a character expression in the presence of COLLATE, the character set used is the one associated with the named collation.
很遗憾,我的系统是5.6,并且无法进行升级(烦人).将数据转换为Latin-1也是一种选择,也不能更改表上的排序规则.
Unfortunately, I’m on 5.6, and I don’t have the option of upgrading (annoyingly). Converting the data to Latin-1 is also not an option, nor is changing the collation on the table.
有没有一种方法可以运行我的查询或等效的查询,得到我想要的结果集,而不会出现排序规则错误?
Is there a way to run my query or an equivalent one yielding the result set I’m after, without getting the collation error?
推荐答案
SET NAMES utf8mb4;
CREATE TABLE x(s VARCHAR(11) COLLATE utf8mb4_unicode_520_ci NOT NULL);
INSERT INTO x (s)
VALUES ('aha'), ('áha̱'), ('ā̧́ḫà̀'),
('i'), ('i̯');
SELECT s FROM x GROUP BY s HAVING COUNT(*) > 1;
回来
啊哈我
没有对数字内容的任何抱怨.
without any complaints about numeric stuff.
我在5.6.46、5.7.26、8.0.16和多个MariaDB版本上运行它.
I ran it on 5.6.46, 5.7.26, 8.0.16 and several MariaDB versions.
我在做什么与您的情况有所不同?
What I am doing differently than your case?
在添加重复的 COLLATE
子句时,将其放在需要它的查询组件中.( COLLATE
不适用于整个查询;不同部分的整理方式也可能不同.)
When adding an explicate COLLATE
clause, put it on the component of the query that needs it. (COLLATE
does not apply to the query as a whole; different parts can be collated differently.)
这篇关于在MySQL 5.6中将整数与显式排序规则进行比较时如何避免无效错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!