如何在mysql regexp中匹配大写字母ÅÄÖ [英] How can I match capital ÅÄÖ in mysql regexp

查看:216
本文介绍了如何在mysql regexp中匹配大写字母ÅÄÖ的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我在MySQL中进行REGEXP比较时,对于瑞典语字符的大写版本,我得到了一些奇怪的结果.我正在使用utf8_swedish_ci归类,我想找到大写的单词.

When I do a REGEXP comparison in MySQL, I get some strange results for the capital versions of the Swedish characters. I am using the utf8_swedish_ci collation and I want to find capitalized words.

SELECT 'Öster' REGEXP BINARY '^[A-ZÅÄÖ][a-zåäö]+$'应该返回1,而SELECT 'öster' REGEXP BINARY '^[A-ZÅÄÖ][a-zåäö]+$'应该返回0,但是得到相反的结果.

SELECT 'Öster' REGEXP BINARY '^[A-ZÅÄÖ][a-zåäö]+$' should return 1 and SELECT 'öster' REGEXP BINARY '^[A-ZÅÄÖ][a-zåäö]+$' should return 0, but I get the opposite result.

SELECT 'Öster' REGEXP BINARY '^[A-ZÅÄÖ][a-zåäö]+$' # returns 0 (incorrect)
SELECT 'öster' REGEXP BINARY '^[A-ZÅÄÖ][a-zåäö]+$' # returns 1 (incorrect)
SELECT 'Söder' REGEXP BINARY '^[A-ZÅÄÖ][a-zåäö]+$' # returns 1 (correct)
SELECT 'söder' REGEXP BINARY '^[A-ZÅÄÖ][a-zåäö]+$' # returns 0 (correct)

如果我使用REGEXP而不是REGEXP BINARY,则'söder'也将匹配(这不是我想要的),但即使如此,'Öster'也不匹配.

If I use REGEXP instead of REGEXP BINARY, 'söder' will also match (which is not what I want), but even then 'Öster' is not a match.

对此我该怎么办?

推荐答案

我知道您已经找到了解决方法,但是想解释一下它为什么起作用. MySQL中的REGEXP不适用于字符",但可以使用字节. Å,Ä,Ö,å,ä和ö在UTF-8中都是两个字节的字符.在正则表达式[ ]构造中使用它们时,正则表达式引擎会单独查看这些字节中的每个字节,并且仅尝试匹配一个字节,而不是匹配组成整个字符的两个字节.如果将这些字符分解为它们的组成字节,则可以看到为什么fluke发生了一些匹配.

I realize you've found a fix, but wanted to explain why it works. REGEXP in MySQL doesn't work with "characters" but works with bytes. Å, Ä, Ö, å, ä, and ö are all two byte characters in UTF-8. When they are used in the regex [ ] construct, the regex engine sees each of these bytes individually and only attempts to match one byte rather than the two bytes that composes the whole character. If you decompose these characters into their constituent bytes, you can see why some matches happened by fluke.

您使用正则表达式'^([A-Z]|Å|Ä|Ö)[a-zåäö]+$'的修复在技术上是可行的,但偶然的是,组成å,ä和ö的字节实际上不允许任何其他意外的格式正确的UTF-8字符串匹配.

Your fix of using the regex '^([A-Z]|Å|Ä|Ö)[a-zåäö]+$' technically works, but it's by chance that the bytes which compose å, ä, and ö don't actually allow any other unintended well-formed UTF-8 character strings to accidentally match.

为清晰起见,我建议使用'^([A-Z]|Å|Ä|Ö)([a-z]|å|ä|ö)+$'.

I would recommend using '^([A-Z]|Å|Ä|Ö)([a-z]|å|ä|ö)+$' for clarity.

这篇关于如何在mysql regexp中匹配大写字母ÅÄÖ的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆