将UTF8表上的latin1字符转换为UTF8 [英] Convert latin1 characters on a UTF8 table into UTF8

查看:94
本文介绍了将UTF8表上的latin1字符转换为UTF8的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

直到今天,我才意识到我的PHP脚本中缺少此信息:

mysql_set_charset('utf8');

我所有的表都是InnoDB,排序规则为"utf8_unicode_ci",我所有的VARCHAR列也都是"utf8_unicode_ci".我的PHP脚本上有mb_internal_encoding('UTF-8');,并且我所有的PHP文件都编码为UTF-8.

因此,直到现在,每次我用变音符号插入"某些东西时,例如:

mysql_query('INSERT INTO `table` SET `name`="Jáuò Iñe"');

在这种情况下,名称"内容为:Jáuò Iñe.

由于我固定了PHP和MySQL之间的字符集,因此现在可以正确存储新的INSERT.但是,我想修复目前已弄乱"的所有较旧的行.我已经尝试了很多事情,但是它总是会破坏第一个非法"字符上的字符串.这是我当前的代码:

$m = mysql_real_escape_string('¿<?php echo "¬<b>\'PHP &aacute; (á)ţăriîş </b>"; ?> ă-ţi abcdd;//;ñç´พดแทฝใจคçăâξβψδπλξξςαยนñ ;');
mysql_set_charset('utf8');
mysql_query('INSERT INTO `table` SET `name`="'.$m.'"');
mysql_set_charset('latin1');
mysql_query('INSERT INTO `table` SET `name`="'.$m.'"');
mysql_set_charset('utf8');

$result = mysql_iquery('SELECT * FROM `table`');
while ($row = mysql_fetch_assoc($result)) {
    $message = $row['name'];
    $message = mb_convert_encoding($message, 'ISO-8859-15', 'UTF-8');
    //$message = iconv("UTF-8", "ISO-8859-1//IGNORE", $message);
    mysql_iquery('UPDATE `table` SET `name`="'.mysql_real_escape_string($message).'" WHERE `a1`="'.$row['a1'].'"');
}

它用预期的字符更新",除了字符串在字符ă"之后被截断.我的意思是,字符串中不包括该字符和以下字符.

此外,即使使用//IGNORE和//TRANSLIT,使用"iconv()"(在代码中有注释)进行测试也是如此.

我还测试了ISO-8859-1和ISO-8859-15之间的几个字符集.

解决方案

根据您的描述,似乎您拥有最初存储为Latin-1且未正确转换为UTF-8的UTF-8数据.数据是可恢复的;您将需要

之类的MySQL函数

convert(cast(convert(name using  latin1) as binary) using utf8)

根据编码转换过程中数据的更改方式,您可能需要省略内部转换.

Only today I realized that I was missing this in my PHP scripts:

mysql_set_charset('utf8');

All my tables are InnoDB, collation "utf8_unicode_ci", and all my VARCHAR columns are "utf8_unicode_ci" as well. I have mb_internal_encoding('UTF-8'); on my PHP scripts, and all my PHP files are encoded as UTF-8.

So, until now, every time I "INSERT" something with diacritics, example:

mysql_query('INSERT INTO `table` SET `name`="Jáuò Iñe"');

The 'name' contents would be, in this case: Jáuò Iñe.

Since I fixed the charset between PHP and MySQL, new INSERTs are now storing correctly. However, I want to fix all the older rows that are "messed" at the moment. I tried many things already, but it always breaks the strings on the first "illegal" character. Here is my current code:

$m = mysql_real_escape_string('¿<?php echo "¬<b>\'PHP &aacute; (á)ţăriîş </b>"; ?> ă-ţi abcdd;//;ñç´พดแทฝใจคçăâξβψδπλξξςαยนñ ;');
mysql_set_charset('utf8');
mysql_query('INSERT INTO `table` SET `name`="'.$m.'"');
mysql_set_charset('latin1');
mysql_query('INSERT INTO `table` SET `name`="'.$m.'"');
mysql_set_charset('utf8');

$result = mysql_iquery('SELECT * FROM `table`');
while ($row = mysql_fetch_assoc($result)) {
    $message = $row['name'];
    $message = mb_convert_encoding($message, 'ISO-8859-15', 'UTF-8');
    //$message = iconv("UTF-8", "ISO-8859-1//IGNORE", $message);
    mysql_iquery('UPDATE `table` SET `name`="'.mysql_real_escape_string($message).'" WHERE `a1`="'.$row['a1'].'"');
}

It "UPDATE"s with the expected characters, except that the string gets truncated after the character "ă". I mean, that character and following chars are not included on the string.

Also, testing with the "iconv()" (that is commented on the code) does the same, even with //IGNORE and //TRANSLIT

I also tested several charsets, between ISO-8859-1 and ISO-8859-15.

解决方案

From what you describe, it seems you have UTF-8 data that was originally stored as Latin-1 and then not converted correctly to UTF-8. The data is recoverable; you'll need a MySQL function like

convert(cast(convert(name using  latin1) as binary) using utf8)

It's possible that you may need to omit the inner conversion, depending on how the data was altered during the encoding conversion.

这篇关于将UTF8表上的latin1字符转换为UTF8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆