如何更正MySQL utf8_general_ci字段中的双编码UTF-8字符串? [英] How to correct double-encoded UTF-8 strings sitting in MySQL utf8_general_ci fields?

查看:586
本文介绍了如何更正MySQL utf8_general_ci字段中的双编码UTF-8字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须重新设计一个类,其中(除其他外)UTF-8字符串被错误地双重编码:

I have to redesign a class where (amongst other things) UTF-8 strings are double-encoded wrongly:

$string = iconv('ISO-8859-1', 'UTF-8', $string);
:
$string = utf8_encode($string);

这些错误的字符串已被保存到整个MySQL数据库的多个表字段中.所有受影响的字段都使用归类utf8_general_ci.

These faulty strings have been saved into multiple table fields all over a MySQL database. All fields being affected use collation utf8_general_ci.

通常,我将设置一些PHP修补程序脚本,遍历受影响的表,选择记录,通过在双编码字段上使用utf8_decode()更正错误的记录并更新它们.

Usually I'd setup a little PHP patch script, looping thru the affected tables, SELECTing the records, correct the faulty records by using utf8_decode() on the double-encoded fields and UPDATE them.

由于这次我有很多大桌子,而且错误仅影响德国变音符号(äöüßÄÖÜ),我想知道是否有比这更智能/更快的解决方案.

As I got many and huge tables this time, and the error only affects german umlauts (äöüßÄÖÜ), I'm wondering if there's a solution smarter/faster than that.

像下面这样的纯MySQL解决方案是否安全且值得推荐?

Are pure MySQL solutions like the following safe and recommendable?

 UPDATE `table` SET `col` = REPLACE(`col`, 'ä', 'ä');

还有其他解决方案/最佳做法吗?

Any other solutions/best practices?

推荐答案

更改表以将列字符集更改为Latin-1.现在,您将拥有单个编码的UTF-8字符串,但是您坐在一个其排序规则应该为Latin-1的字段中.

Alter the table to change the column character set to Latin-1. You will now have singly-encoded UTF-8 strings, but sitting in a field whose collation is supposed to be Latin-1.

然后要做的是,通过二进制字符集将列字符集改回UTF-8-这样,MySQL不会在任何时候转换字符.

What you do then is, change the column character set back to UTF-8 via the binary character set - that way MySQL doesn't convert the characters at any point.

ALTER TABLE MyTable MODIFY MyColumn ... CHARACTER SET latin1
ALTER TABLE MyTable MODIFY MyColumn ... CHARACTER SET binary
ALTER TABLE MyTable MODIFY MyColumn ... CHARACTER SET utf8

(正确的语法iirc;将适当的列类型放在...所在的位置)

(is the correct syntax iirc; put the appropriate column type in where ... is)

这篇关于如何更正MySQL utf8_general_ci字段中的双编码UTF-8字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆