在插入错误的编码后更新MySQL数据库中的数据 [英] Updating data in MySQL database after inserting in the wrong encoding
问题描述
我使用PHP和MySQL来保存阿拉伯数据。
我的数据库排序规则是: utf8_general_ci
p>
我的数据库字符集是: utf8
我在插入之前没有使用 SET NAMES utf8
,所以数据插入数据库中的奇怪字符,但在我的应用程序中正确显示。在使用 SET NAMES utf8
后,即使在我的应用程序中,数据也被正确插入,但旧的数据不会以阿拉伯字符显示(新数据可以)。
如何在应用程序和MySQL数据库中更新所有显示为阿拉伯字母的数据?
更新
当我检查两个字符串(插入的字符串和我要转换的字符串)的编码时使用 mb_detect_encoding
function我得到两个字符串都是UTF-8。
这里是两个字符串的例子:
插入的文字:اÙإسÙ...باÙعربية
我需要转换为的文本:الإسمبالعربية
- 客户端有编码为utf8的字符;和
-
SET NAMES latin1
谎称客户端具有latin1编码;和 - 表格中声明为
CHARACTER SET utf8
的列。
让我们来看看e-acute发生了什么:é
。
- 十六进制,在utf8是2字节:
C3A9
。 -
latin1
将其视为2个latin1编码的字符Ã
和©
(hex:C3
和A9
) -
CHARACTER SET utf8
,这两个字符需要转换。
已转换为utf8(hex
C383
)和©$因此,存储了4个字节(十六进制
C383C2A9 $>),并且存储了4个字节(hex
C2A9
)
< c $ c>)
当读取它时,执行相反的步骤,
和最终用户可能注意到没有什么不对。错误:
- 存储的数据是应用数据的2倍(亚洲语言为3倍)。
- 等于,大于等的比较可能无法正常工作。
-
ORDER BY
可能无法正常工作
这样会修复您的资料:
UPDATE ... SET col = CONVERT(BINARY(CONVERT(
CONVERT(UNHEX(col)USING utf8)
使用latin1))使用utf8);
I am using PHP and MySQL for saving Arabic data.
My database collation is : utf8_general_ci
My database Character set is : utf8
At first I did not use SET NAMES utf8
before insertion so the data was inserted in strange characters in the database but was displayed properly in my application. After using SET NAMES utf8
, data is inserted properly but the old data is not displayed in Arabic characters even in my application (The new data is ok) .
How can I update all the data to be displayed in Arabic letters in both my application and MySQL database?
Update
When I check the encoding of the both of strings ( the one that is inserted and the one that I want to convert to ) using mb_detect_encoding
function I get that both of the strings are UTF-8 .
Here is an example of the two strings :
the text inserted : الإسم بالعربية
the text I need to convert to : الإسم بالعربية
You suffer from "double encoding".
Here's what happened.
- The client had characters encoded as utf8; and
SET NAMES latin1
lied by claiming that the client had latin1 encoding; and- The column in the table declared
CHARACTER SET utf8
.
Let's walk through what happens to e-acute: é
.
- The hex for that, in utf8 is 2 bytes:
C3A9
. SET NAMES latin1
saw it as 2 latin1-encoded charactersÃ
and©
(hex:C3
andA9
)- Since the target was
CHARACTER SET utf8
, those 2 characters needed to be converted.Ã
was converted to utf8 (hexC383
) and©
(hexC2A9
) - So, 4 bytes were stored (hex
C383C2A9
)
When reading it back out, the reverse steps were performed, and the end user possibly noticed nothing wrong. What is wrong:
- The data stored is 2 times as big as it should be (3x for Asian languages).
- Comparisions for equal, greater than, etc may not work as expected.
ORDER BY
may not work as expected.
Something like this will repair your data:
UPDATE ... SET col = CONVERT(BINARY(CONVERT(
CONVERT(UNHEX(col) USING utf8)
USING latin1)) USING utf8);
More discussion and More examples of fixing it
这篇关于在插入错误的编码后更新MySQL数据库中的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!