MySQL:查询unicode实体 [英] MySQL: Querying for unicode entities

查看:92
本文介绍了MySQL:查询unicode实体的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要从数据库中找到单词Lämmönmyyntipalvelut.只是,在数据库中,它位于一个字段,该字段的值是一个PHP数组,并使用json_encode()转换为JSON,因此特殊字符被乱写为十六进制unicode.

I need to find the word Lämmönmyyntipalvelut from the database. Only, in the database it is in a field, whose value has been a PHP array, converted into JSON using json_encode() and so the special characters are scrabled into hex unicode.

所以我的查询是

SELECT * FROM table WHERE (services LIKE '%Lämmönmyyntipalvelut%')

没有结果.没有惊喜.接下来,查询转换为特殊字符的

No results. No surprise. Next, query with special characters converted:

SELECT * FROM table WHERE (services LIKE '%L\u00e4mm\u00f6nmyyntipalvelut%')

没有结果,我想知道为什么.接下来,我测试了仅查询特殊字符:

No results and I wonder why. Next I tested querying for only special character:

SELECT * FROM table WHERE (services LIKE '%\u00e4%')

找到了应该找到的东西.接下来,我开始添加内容(以L开头),以查看出现问题的地方:

Found what was supposed to find. Next I started adding stuff (L to beginning) to see where it went wrong:

SELECT * FROM table WHERE (services LIKE '%L\u00e4%')

没有结果.另一个测试:

No results. Another test:

SELECT * FROM table WHERE (services LIKE '%\u00e4mm%')

找到了应该找到的东西.

Found what was supposed to find.

所以我的结论是反斜杠某种程度上使事情搞砸了,但我不知道怎么做?

So my conclusion is that the backslash is somehow messing things up, but I don't understand how?

确切的服务内容字段:

["Neuvonta","L\u00e4mm\u00f6nmyyntipalvelut",
"Mets\u00e4-\/energiapuunkorjuupalvelut"]

确切查询:

SELECT id, uid, company_name, services, logo FROM rekisteroeidy_toimijaks 
WHERE 
    (services LIKE '%L\u00e4mm\u00f6nmyyntipalvelut%' AND 
    services LIKE '%Mets\u00e4-\/energiapuunkorjuupalvelut%') 
ORDER BY company_name ASC

我添加了一些换行符以提高可读性.

I added some line breaks to help readability.

推荐答案

我绝对不知道为什么,但是三次转义会有所帮助!

I have absolutely no idea why, but triple escaping helps!

好吧,这只是双重转义,但是可以,这就是原因:在MySQL中,当您使用LIKE运算符时,涉及到第二层转义.

Well, that's only double-escaping, but yes it works and here's why: in MySQL, there is a second layer of escaping involved when you use the LIKE operator.

services LIKE '%L\\\\u00e4mm\\\\u00f6n%'

解析该MySQL字符串文字可以使您与LIKE查询%L\\u00e4mm\\u00f6n%进行比较.由于MySQL将LIKE查询中的\视为转义符,因此实际上将匹配包含L\u00e4mm\u00f6n的文字字符串.

parsing that MySQL string literal gives you a comparison with the LIKE-query %L\\u00e4mm\\u00f6n%. Because MySQL treats \ in a LIKE query as an escape, that will actually match the literal string containing L\u00e4mm\u00f6n.

这样做的原因是,您可以将字符串与包含文字%_字符的查询表达式匹配.例如,如果我想在列中搜索文字字符串100%,则可以将其与100\%(在查询中写为'100\\%')匹配,并确保我真的得到了100%的收益,而没有得到任何收益以一百开头的字符串.

The reason for this is so that you can match strings against a query expression that contains a literal % or _ character. For example if I want to search a column for the literal string 100%, I can match it against 100\% (written in a query as '100\\%') and make sure I'm really getting one hundred percent and just not any string starting with a hundred.

不幸的是,MySQL在其LIKE查询转义和字符串文字转义中都使用了反斜杠,特别是考虑到您可能正在使用也使用它们的封闭式编程语言进行编写,最终得到了实际的三重编码,看起来像"services LIKE '%L\\\\\\\\u00e4mm\\\\\\\\u00f6n%'"-哎呀!

It's unfortunate that MySQL uses backslash for both its LIKE query escaping and its string literal escaping, especially given that you're probably writing in an enclosing programming language that also uses them, ending up with actual triple-encoding, which looks like "services LIKE '%L\\\\\\\\u00e4mm\\\\\\\\u00f6n%'" - argh!

不幸的是,这种行为不符合ANSI SQL,并且在任何其他数据库中均不起作用. ANSI SQL表示默认情况下,LIKE查询中没有转义字符,因此,如果要匹配文字%_,则必须通过指定自己的转义字符来选择加入,例如:

It's doubly unfortunate given that this behaviour is not ANSI SQL conformant, and won't work in any other database. ANSI SQL says that there is no escape character in LIKE queries by default, so if you want to match a literal % or _ you have to opt in by nominating an escape character of your own, eg.:

something LIKE '100=%' ESCAPE '='

出于跨数据库兼容性的考虑,最好始终使用LIKE ... ESCAPE形式,并选择除可怕的反斜杠之外的其他东西! (此外-MySQL的SQL字符串文字转义的反斜杠也不符合ANSI!但是您可以使用NO_BACKSLASH_ESCAPES sql_mode设置来消除这种不当行为.)

For cross-database compatibility, it is best always to use the LIKE...ESCAPE form, and pick something other than the horrible backslash! (Aside - MySQL's backslashes for SQL string literal escaping aren't ANSI conformant either! But you can turn that misbehaviour off with the NO_BACKSLASH_ESCAPES sql_mode setting.)

一个更好的主意可能是将services分解为第二个表,而不是将它们压缩为单个字符串列-即.将您的架构放在第一范式中.然后,您可以简单地查找各个值,而不必进行慢速的全表扫描子字符串匹配.

Probably a better idea would be to break services out into a second table rather than squashing them into a single string column - ie. put your schema in First Normal Form. Then you could get a simple lookup of individual values rather than having to do a slow full-table-scan substring-match.

这篇关于MySQL:查询unicode实体的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆