将 UTF8 表上的 latin1 字符转换为 UTF8 [英] Convert latin1 characters on a UTF8 table into UTF8

查看:45
本文介绍了将 UTF8 表上的 latin1 字符转换为 UTF8的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

直到今天我才意识到我在 PHP 脚本中遗漏了这一点:

Only today I realized that I was missing this in my PHP scripts:

mysql_set_charset('utf8');

我所有的表都是 InnoDB,整理utf8_unicode_ci",我所有的 VARCHAR 列也是utf8_unicode_ci".我的 PHP 脚本中有 mb_internal_encoding('UTF-8');,并且我的所有 PHP 文件都编码为 UTF-8.

All my tables are InnoDB, collation "utf8_unicode_ci", and all my VARCHAR columns are "utf8_unicode_ci" as well. I have mb_internal_encoding('UTF-8'); on my PHP scripts, and all my PHP files are encoded as UTF-8.

所以,直到现在,每次我插入"带有变音符号的东西时,例如:

So, until now, every time I "INSERT" something with diacritics, example:

mysql_query('INSERT INTO `table` SET `name`="Jáuò Iñe"');

'name' 内容在本例中为:JáuòIñe.

The 'name' contents would be, in this case: Jáuò Iñe.

由于我修复了 PHP 和 MySQL 之间的字符集,新的 INSERT 现在可以正确存储.但是,我想修复目前混乱"的所有旧行.我已经尝试了很多东西,但它总是打破第一个非法"字符的字符串.这是我当前的代码:

Since I fixed the charset between PHP and MySQL, new INSERTs are now storing correctly. However, I want to fix all the older rows that are "messed" at the moment. I tried many things already, but it always breaks the strings on the first "illegal" character. Here is my current code:

$m = mysql_real_escape_string('¿<?php echo "¬<b>\'PHP &aacute; (á)ţăriîş </b>"; ?> ă-ţi abcdd;//;ñç´พดแทฝใจคçăâξβψδπλξξςαยนñ ;');
mysql_set_charset('utf8');
mysql_query('INSERT INTO `table` SET `name`="'.$m.'"');
mysql_set_charset('latin1');
mysql_query('INSERT INTO `table` SET `name`="'.$m.'"');
mysql_set_charset('utf8');

$result = mysql_iquery('SELECT * FROM `table`');
while ($row = mysql_fetch_assoc($result)) {
    $message = $row['name'];
    $message = mb_convert_encoding($message, 'ISO-8859-15', 'UTF-8');
    //$message = iconv("UTF-8", "ISO-8859-1//IGNORE", $message);
    mysql_iquery('UPDATE `table` SET `name`="'.mysql_real_escape_string($message).'" WHERE `a1`="'.$row['a1'].'"');
}

它用预期的字符更新",除了字符串在字符ă"之后被截断.我的意思是,该字符和后面的字符不包含在字符串中.

It "UPDATE"s with the expected characters, except that the string gets truncated after the character "ă". I mean, that character and following chars are not included on the string.

此外,即使使用//IGNORE 和//TRANSLIT

Also, testing with the "iconv()" (that is commented on the code) does the same, even with //IGNORE and //TRANSLIT

我还测试了 ISO-8859-1 和 ISO-8859-15 之间的几个字符集.

I also tested several charsets, between ISO-8859-1 and ISO-8859-15.

推荐答案

从您的描述来看,您的 UTF-8 数据似乎最初存储为 Latin-1,然后没有正确转换为 UTF-8.数据是可恢复的;你需要一个像

From what you describe, it seems you have UTF-8 data that was originally stored as Latin-1 and then not converted correctly to UTF-8. The data is recoverable; you'll need a MySQL function like

convert(cast(convert(name using  latin1) as binary) using utf8)

您可能需要省略内部转换,具体取决于编码转换期间数据的更改方式.

It's possible that you may need to omit the inner conversion, depending on how the data was altered during the encoding conversion.

这篇关于将 UTF8 表上的 latin1 字符转换为 UTF8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆