MySQL选择带'='但不带'LIKE'的UTF-8字符串 [英] MySQL select UTF-8 string with '=' but not with 'LIKE'

查看:127
本文介绍了MySQL选择带'='但不带'LIKE'的UTF-8字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一张桌子,桌子上有一些来自中世纪书籍的单词,并且有一些带有重音符号的字母,这些字母在现代的拉丁字母中已经不存在了.我可以使用UTF-8组合字符轻松地表示这些字母.例如,要创建一个带有波浪号的"J",我使用UTF-8序列\ u004A + \ u0303,并且J会加一个波浪号.

I have a table with some words that come from medieval books and have some accented letters that doesn't exists anymore in modern latin1 alphabet. I can represent these letters easily with UTF-8 combining characters. For example, to create a "J" with a tilde, I use the UTF-8 sequence \u004A+\u0303 and the J becomes accented with a tilde.

该表使用utf8编码,字段排序规则为utf8_unicode_ci.

The table uses utf8 encoding and the field collation is utf8_unicode_ci.

我的问题如下:如果我尝试选择整个字符串,则会收到正确的答案.如果我尝试使用喜欢"进行选择,则会收到错误的答案.

My problem is the following: If I try to select the entire string, I receive the correct answer. If I try to select using 'LIKE', I receive the wrong answer.

例如:

mysql> select word, hex(word) from oldword where word = 'hua';
+--------+--------------+
| word   | hex(word)    |
+--------+--------------+
| hũa    | 6875CC8361   |
| huã    | 6875C3A3     |
| hua    | 687561       |
| hũã    | 6875CC83C3A3 |
+--------+--------------+
4 rows in set (0,04 sec)

mysql> select word, hex(word) from oldword where word like 'hua';
+-------+------------+
| word  | hex(word)  |
+-------+------------+
| huã   | 6875C3A3   |
| hua   | 687561     |
+-------+------------+
2 rows in set (0,04 sec)

我不想只搜索整个单词.我想搜索以某些子字符串开头的单词.最终,搜索到的单词就是整个单词.

I don't want to search only the entire word. I want to search words that start with some substring. Eventually the searched word is the entire word.

如何使用like选择部分字符串并匹配所有字符串?

How could I select the partial string using like and match all the strings?

我尝试使用创建自定义归类此信息,但是服务器变得不稳定,只有经过很多试验和错误,我才能够再次恢复为utf8_unicode_ci排序规则,服务器恢复了正常状态.

I tried to create a custom collation using this information, but the server became unstable and only after a lot of trials and errors I was able to revert to the utf8_unicode_ci collation again and the server returned to normal condition.

编辑:该网站存在问题,某些字符无法正确显示.请查看这些粘贴框上的结果:

There's a problem with this site and some characters don't display correctly. Please see the results on these pastebins:

http://pastebin.com/mckJTLFX

http://pastebin.com/WP87QvgB

推荐答案

看到Marcus Adams的回答后,我意识到REPLACE函数可以解决此问题,尽管他没有提到此函数.

After seeing Marcus Adams' answer I realized that the REPLACE function could be the solution for this problem, although he didn't mentioned this function.

因为我只有两个不同的组合字符(急性和波浪号),并且与其他ASCII字符相结合,例如j与波浪号,j与锐角,m与波浪号,s与波浪号等.使用LIKE时,我只需要替换这两个字符即可.

As I have only two different combining characters (acute and tilde), combined with other ASCII characters, for example j with tilde, j with acute, m with tilde, s with tilde, and so on. I just have to replace these two characters when using LIKE.

搜索手册后,我了解了UNHEX函数,该函数帮助我正确地在查询中仅表示组合字符以将其删除.

After searching the manual, I learned about the UNHEX function that helped me to properly represent the combining characters alone in the query to remove them.

十六进制代码中的组合波浪号由CC83表示,十六进制中的尖峰由CC81表示.

The combining tilde is represented by CC83 in HEX code and the acute is represented by CC81 in HEX.

因此,解决我的问题的查询就是这个.

So, the query that solves my problem is this one.

SELECT word, REPLACE(REPLACE(word, UNHEX("CC83"), ""), UNHEX("CC81"), "")
FROM oldword WHERE REPLACE(REPLACE(word, UNHEX("CC83"), ""), UNHEX("CC81"), "") 
LIKE 'hua%';`

这篇关于MySQL选择带'='但不带'LIKE'的UTF-8字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆