在插入错误的编码后更新MySQL数据库中的数据 [英] Updating data in MySQL database after inserting in the wrong encoding

查看:169
本文介绍了在插入错误的编码后更新MySQL数据库中的数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用PHP和MySQL来保存阿拉伯数据。



我的数据库排序规则是: utf8_general_ci p>

我的数据库字符集是: utf8



我在插入之前没有使用 SET NAMES utf8 ,所以数据插入数据库中的奇怪字符,但在我的应用程序中正确显示。在使用 SET NAMES utf8 后,即使在我的应用程序中,数据也被正确插入,但旧的数据不会以阿拉伯字符显示(新数据可以)。



如何在应用程序和MySQL数据库中更新所有显示为阿拉伯字母的数据?



更新



当我检查两个字符串(插入的字符串和我要转换的字符串)的编码时使用 mb_detect_encoding function我得到两个字符串都是UTF-8。



这里是两个字符串的例子:



插入的文字:اÙإسÙ...باÙعربÙ​​ŠØ©
我需要转换为的文本:الإسمبالعربية

解决方案




  • 客户端有编码为utf8的字符;和

  • SET NAMES latin1 谎称客户端具有latin1编码;和

  • 表格中声明为 CHARACTER SET utf8 的列。



让我们来看看e-acute发生了什么:é


  1. 十六进制,在utf8是2字节: C3A9

  2. latin1 将其视为2个latin1编码的字符 é(hex: C3 A9

  3. CHARACTER SET utf8 ,这两个字符需要转换。
    已转换为utf8(hex C383 )和© C383C2A9 ),并且存储了4个字节(hex C2A9

  4. < c $ c>)

当读取它时,执行相反的步骤,
和最终用户可能注意到没有什么不对。错误:




  • 存储的数据是应用数据的2倍(亚洲语言为3倍)。

  • 等于,大于等的比较可能无法正常工作。

  • ORDER BY 可能无法正常工作



这样会修复您的资料:

  UPDATE ... SET col = CONVERT(BINARY(CONVERT(
CONVERT(UNHEX(col)USING utf8)
使用latin1))使用utf8);

更多讨论
更多示例修复它


I am using PHP and MySQL for saving Arabic data.

My database collation is : utf8_general_ci

My database Character set is : utf8

At first I did not use SET NAMES utf8 before insertion so the data was inserted in strange characters in the database but was displayed properly in my application. After using SET NAMES utf8, data is inserted properly but the old data is not displayed in Arabic characters even in my application (The new data is ok) .

How can I update all the data to be displayed in Arabic letters in both my application and MySQL database?

Update

When I check the encoding of the both of strings ( the one that is inserted and the one that I want to convert to ) using mb_detect_encoding function I get that both of the strings are UTF-8 .

Here is an example of the two strings :

the text inserted : الإسم بالعربية the text I need to convert to : الإسم بالعربية

解决方案

You suffer from "double encoding".

Here's what happened.

  • The client had characters encoded as utf8; and
  • SET NAMES latin1 lied by claiming that the client had latin1 encoding; and
  • The column in the table declared CHARACTER SET utf8.

Let's walk through what happens to e-acute: é.

  1. The hex for that, in utf8 is 2 bytes: C3A9.
  2. SET NAMES latin1 saw it as 2 latin1-encoded characters à and © (hex: C3 and A9)
  3. Since the target was CHARACTER SET utf8, those 2 characters needed to be converted. Ã was converted to utf8 (hex C383) and © (hex C2A9)
  4. So, 4 bytes were stored (hex C383C2A9)

When reading it back out, the reverse steps were performed, and the end user possibly noticed nothing wrong. What is wrong:

  • The data stored is 2 times as big as it should be (3x for Asian languages).
  • Comparisions for equal, greater than, etc may not work as expected.
  • ORDER BY may not work as expected.

Something like this will repair your data:

UPDATE ... SET col = CONVERT(BINARY(CONVERT(
                         CONVERT(UNHEX(col) USING utf8)
                         USING latin1)) USING utf8);

More discussion and More examples of fixing it

这篇关于在插入错误的编码后更新MySQL数据库中的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆