如何检测Latin1编码列中的UTF-8字符 - MySQL [英] How to detect UTF-8 characters in a Latin1 encoded column - MySQL

查看:206
本文介绍了如何检测Latin1编码列中的UTF-8字符 - MySQL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将执行将数据库从Latin1转换为UTF-8的冗长乏味的任务。



在这一点上,我只想检查我存储在我的表中的数据类型,因为这将决定我应该使用什么方法来转换数据。 p>

具体来说,我想检查在Latin1列中是否有UTF-8字符,最好的方法是什么?



选项1.执行MySQL转储并使用Perl搜索UTF-8字符?

/ p>

选项2.使用MySQL CHAR_LENGTH查找具有多字节字符的行?
例如 SELECT name FROM clients WHERE LENGTH(name)!= CHAR_LENGTH(name);
这足够吗?



目前我已将Mysql客户端编码切换为UTF-8。

解决方案

字符编码一个常见的问题来源。



您可以做的是寻找任何高ASCII字符,因为这些字符是LATIN1重音字符或符号,或第一个UTF-8多字节字符。



要想知道什么编码是正确的,你只需要 SELECT 两个不同的版本并直观比较。下面是一个例子:

  SELECT CONVERT(CONVERT(name USING BINARY)using latin1)AS latin1,
CONVERT (name USING BINARY)使用utf8)AS utf8
FROM users
WHERE CONVERT(名称使用二进制)RLIKE CONCAT('[',UNHEX('80'),' - ',UNHEX ),']')

这是非常复杂的,因为MySQL regexp引擎似乎忽略了 \x80 ,因此需要使用 UNHEX()方法。



这会产生如下结果:

  latin1 utf8 
- ---------------------------------
Bjö¶nBjörn


I am about to undertake the tedious and gotcha-laden task of converting a database from Latin1 to UTF-8.

At this point I simply want to check what sort of data I have stored in my tables, as that will determine what approach I should use to convert the data.

Specifically, I want to check if I have UTF-8 characters in the Latin1 columns, what would be the best way to do this? If only a few rows are affected, then I can just fix this manually.

Option 1. Perform a MySQL dump and use Perl to search for UTF-8 characters?

Option 2. Use MySQL CHAR_LENGTH to find rows with multi-byte characters? e.g. SELECT name FROM clients WHERE LENGTH(name) != CHAR_LENGTH(name); Is this enough?

At the moment I have switched my Mysql client encoding to UTF-8.

解决方案

Character encoding, like time zones, is a constant source of problems.

What you can do is look for any "high-ASCII" characters as these are either LATIN1 accented characters or symbols, or the first of a UTF-8 multi-byte character. Telling the difference isn't going to be easy unless you cheat a bit.

To figure out what encoding is correct, you just SELECT two different versions and compare visually. Here's an example:

SELECT CONVERT(CONVERT(name USING BINARY) USING latin1) AS latin1, 
       CONVERT(CONVERT(name USING BINARY) USING utf8) AS utf8 
FROM users 
WHERE CONVERT(name USING BINARY) RLIKE CONCAT('[', UNHEX('80'), '-', UNHEX('FF'), ']')

This is made unusually complicated because the MySQL regexp engine seems to ignore things like \x80 and makes it necessary to use the UNHEX() method instead.

This produces results like this:

latin1                utf8
----------------------------------------
Björn                Björn

这篇关于如何检测Latin1编码列中的UTF-8字符 - MySQL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆