MySQL:查询 unicode 实体 [英] MySQL: Querying for unicode entities

查看:36
本文介绍了MySQL:查询 unicode 实体的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要从数据库中找到Lämmönmyyntipalvelut这个词.只是,在数据库中,它在一个字段中,其值是一个 PHP 数组,使用 json_encode() 转换为 JSON,因此将特殊字符拼写为十六进制 unicode.

I need to find the word Lämmönmyyntipalvelut from the database. Only, in the database it is in a field, whose value has been a PHP array, converted into JSON using json_encode() and so the special characters are scrabled into hex unicode.

所以我的查询是

SELECT * FROM table WHERE (services LIKE '%Lämmönmyyntipalvelut%')

没有结果.没有惊喜.接下来,转换特殊字符的查询:

No results. No surprise. Next, query with special characters converted:

SELECT * FROM table WHERE (services LIKE '%Lu00e4mmu00f6nmyyntipalvelut%')

没有结果,我想知道为什么.接下来我测试了仅查询特殊字符:

No results and I wonder why. Next I tested querying for only special character:

SELECT * FROM table WHERE (services LIKE '%u00e4%')

找到了应该找到的东西.接下来我开始添加东西(L到开头)以查看哪里出错了:

Found what was supposed to find. Next I started adding stuff (L to beginning) to see where it went wrong:

SELECT * FROM table WHERE (services LIKE '%Lu00e4%')

没有结果.另一个测试:

No results. Another test:

SELECT * FROM table WHERE (services LIKE '%u00e4mm%')

找到了应该找到的东西.

Found what was supposed to find.

所以我的结论是反斜杠在某种程度上把事情搞砸了,但我不明白是怎么回事?

So my conclusion is that the backslash is somehow messing things up, but I don't understand how?

服务字段的确切内容:

["Neuvonta","Lu00e4mmu00f6nmyyntipalvelut",
"Metsu00e4-/energiapuunkorjuupalvelut"]

准确查询:

SELECT id, uid, company_name, services, logo FROM rekisteroeidy_toimijaks 
WHERE 
    (services LIKE '%Lu00e4mmu00f6nmyyntipalvelut%' AND 
    services LIKE '%Metsu00e4-/energiapuunkorjuupalvelut%') 
ORDER BY company_name ASC

我添加了一些换行符以提高可读性.

I added some line breaks to help readability.

推荐答案

我完全不知道为什么,但三重转义有帮助!

I have absolutely no idea why, but triple escaping helps!

嗯,这只是双重转义,但它确实有效,原因如下:在 MySQL 中,当您使用 LIKE 运算符时,涉及第二层转义.

Well, that's only double-escaping, but yes it works and here's why: in MySQL, there is a second layer of escaping involved when you use the LIKE operator.

services LIKE '%L\\u00e4mm\\u00f6n%'

解析 MySQL 字符串字面量可以与 LIKE 查询 %L\u00e4mm\u00f6n% 进行比较.因为 MySQL 将 LIKE 查询中的 视为转义,这实际上将匹配包含 Lu00e4mmu00f6n 的文字字符串.

parsing that MySQL string literal gives you a comparison with the LIKE-query %L\u00e4mm\u00f6n%. Because MySQL treats in a LIKE query as an escape, that will actually match the literal string containing Lu00e4mmu00f6n.

这样做的原因是您可以将字符串与包含文字 %_ 字符的查询表达式进行匹配.例如,如果我想在列中搜索文字字符串 100%,我可以将它与 100\%(在查询中写成 '100\%') 并确保我真的得到了 100% 而不是任何以 100 开头的字符串.

The reason for this is so that you can match strings against a query expression that contains a literal % or _ character. For example if I want to search a column for the literal string 100%, I can match it against 100\% (written in a query as '100\%') and make sure I'm really getting one hundred percent and just not any string starting with a hundred.

不幸的是,MySQL 对其 LIKE 查询转义和字符串文字转义都使用反斜杠,特别是考虑到您可能正在使用一种也使用它们的封闭式编程语言进行编写,最终以实际的三重编码结束,看起来像服务 LIKE '%L\\\\u00e4mm\\\\u00f6n%'" - 啊!

It's unfortunate that MySQL uses backslash for both its LIKE query escaping and its string literal escaping, especially given that you're probably writing in an enclosing programming language that also uses them, ending up with actual triple-encoding, which looks like "services LIKE '%L\\\\u00e4mm\\\\u00f6n%'" - argh!

鉴于此行为不符合 ANSI SQL 标准,并且在任何其他数据库中都无法使用,这是双重不幸的.ANSI SQL 说在 LIKE 查询中默认没有转义字符,所以如果你想匹配文字 %_ 你必须通过指定一个转义字符来选择你自己的,例如:

It's doubly unfortunate given that this behaviour is not ANSI SQL conformant, and won't work in any other database. ANSI SQL says that there is no escape character in LIKE queries by default, so if you want to match a literal % or _ you have to opt in by nominating an escape character of your own, eg.:

something LIKE '100=%' ESCAPE '='

为了跨数据库兼容性,最好总是使用 LIKE...ESCAPE 形式,并选择可怕的反斜杠以外的其他东西!(另外 - MySQL 用于 SQL 字符串文字转义的反斜杠也不符合 ANSI!但您可以使用 NO_BACKSLASH_ESCAPES sql_mode 设置关闭这种不当行为.)

For cross-database compatibility, it is best always to use the LIKE...ESCAPE form, and pick something other than the horrible backslash! (Aside - MySQL's backslashes for SQL string literal escaping aren't ANSI conformant either! But you can turn that misbehaviour off with the NO_BACKSLASH_ESCAPES sql_mode setting.)

可能更好的主意是将 services 分解到第二个表中,而不是将它们压缩到单个字符串列中 - 即.把你的模式放在第一范式中.然后,您可以简单地查找单个值,而不必进行缓慢的全表扫描子字符串匹配.

Probably a better idea would be to break services out into a second table rather than squashing them into a single string column - ie. put your schema in First Normal Form. Then you could get a simple lookup of individual values rather than having to do a slow full-table-scan substring-match.

这篇关于MySQL:查询 unicode 实体的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆