如何将错误编码的数据转换为UTF-8? [英] How to convert wrongly encoded data to UTF-8?
问题描述
我正在使用来自旧的mysql数据库的数据。在此数据库中有一个表,其中的字符串列的编码设置为cp1252西欧(latin1)(与Windows-1252相同)。当从mysql命令提示符查询数据时,该字段中的数据表示为:
Obama
这应该是
Obama's
p>
我已尝试按照在MySQL中将字段转换为UTF-8,但它不会区别。
我也尝试在该表中插入一个新行,使用 Obama的
再次,从mysql命令提示符)。但是,当我查询刚刚插入的同一行时,此文本被正确表示。我试图执行该插入时,字段设置为latin1,当它设置为UTF-8。相同的结果。
这让我相信,当坏数据插入数据库时,它首先由PHP编码不正确。这是我对我模糊的地方。
我可以假设数据是通过网络表单插入并用PHP处理的。 PHP在将它插入数据库之前做了什么?将字符串转换为UTF-8,根据此帮助页面上的表,使用三个字节%E2%80%99
表示'
字符。我有这个权利吗?
如果这是正确的,我的选项是什么修复这个数据?我想将表及其字段转换为UTF-8编码,但这似乎并没有修复文本。
select convert(使用utf8的binary convert(field_name using latin1)from table_name
你可以做更新。
I'm working with data from an old mysql database. There's a table in this database with a string column that has its encoding set to "cp1252 West European (latin1)" (same as Windows-1252). When querying the data from mysql command prompt, data from this field is represented as:
Obama’s
This is supposed to read
Obama’s
I've tried following the accepted answer for How to convert an entire MySQL database characterset and collation to UTF-8? to convert the field to UTF-8 in MySQL, but it makes no difference.
I also tried inserting a new row into that table, using Obama’s
as the text for that field (again, from the mysql command prompt). However, this text is correctly represented when I then query the same row I just inserted. I tried performing that insertion both when the field was set to latin1 and when it was set to UTF-8. Same result.
This leads me to believe that when the bad data was inserted into the database, it was first incorrectly encoded by PHP. This is where it gets fuzzy to me.
I can assume that the data was inserted via a web form and processed with PHP. What did PHP do with it before inserting it into the database? Did it convert the string to UTF-8, which according to the table on this helpful page, uses the three bytes %E2 %80 %99
to represent the ’
character. Do I have that right?
If that's correct, what are my options to repair this data? I'd like to convert the table and its fields to UTF-8 encodings, but that doesn't seem to fix the text. Do I have to write a script that manually changes those characters to what they should be?
select convert(binary convert(field_name using latin1) using utf8) from table_name
If this displays correctly you can do update.
这篇关于如何将错误编码的数据转换为UTF-8?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!