如何将错误编码的数据转换为UTF-8? [英] How to convert wrongly encoded data to UTF-8?

查看:261
本文介绍了如何将错误编码的数据转换为UTF-8?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用来自旧的mysql数据库的数据。在此数据库中有一个表,其中的字符串列的编码设置为cp1252西欧(latin1)(与Windows-1252相同)。当从mysql命令提示符查询数据时,该字段中的数据表示为:



Obama



这应该是



Obama's p>

我已尝试按照在MySQL中将字段转换为UTF-8,但它不会区别。



我也尝试在该表中插入一个新行,使用 Obama的再次,从mysql命令提示符)。但是,当我查询刚刚插入的同一行时,此文本被正确表示。我试图执行该插入时,字段设置为latin1,当它设置为UTF-8。相同的结果。



这让我相信,当坏数据插入数据库时​​,它首先由PHP编码不正确。这是我对我模糊的地方。



我可以假设数据是通过网络表单插入并用PHP处理的。 PHP在将它插入数据库之前做了什么?将字符串转换为UTF-8,根据此帮助页面上的,使用三个字节%E2%80%99 表示'字符。我有这个权利吗?



如果这是正确的,我的选项是什么修复这个数据?我想将表及其字段转换为UTF-8编码,但这似乎并没有修复文本。

解决方案

  select convert(使用utf8的binary convert(field_name using latin1)from table_name 

你可以做更新。


I'm working with data from an old mysql database. There's a table in this database with a string column that has its encoding set to "cp1252 West European (latin1)" (same as Windows-1252). When querying the data from mysql command prompt, data from this field is represented as:

Obama’s

This is supposed to read

Obama’s

I've tried following the accepted answer for How to convert an entire MySQL database characterset and collation to UTF-8? to convert the field to UTF-8 in MySQL, but it makes no difference.

I also tried inserting a new row into that table, using Obama’s as the text for that field (again, from the mysql command prompt). However, this text is correctly represented when I then query the same row I just inserted. I tried performing that insertion both when the field was set to latin1 and when it was set to UTF-8. Same result.

This leads me to believe that when the bad data was inserted into the database, it was first incorrectly encoded by PHP. This is where it gets fuzzy to me.

I can assume that the data was inserted via a web form and processed with PHP. What did PHP do with it before inserting it into the database? Did it convert the string to UTF-8, which according to the table on this helpful page, uses the three bytes %E2 %80 %99 to represent the character. Do I have that right?

If that's correct, what are my options to repair this data? I'd like to convert the table and its fields to UTF-8 encodings, but that doesn't seem to fix the text. Do I have to write a script that manually changes those characters to what they should be?

解决方案

select convert(binary convert(field_name using latin1) using utf8) from table_name

If this displays correctly you can do update.

这篇关于如何将错误编码的数据转换为UTF-8?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆