配置更改后，MySQL数据库中的UTF-8字符串被弄乱了 [英] UTF-8 strings in a MySQL database got messed up after configuration change

查看：136 发布时间：2020/5/15 4:16:00 php mysql utf-8

本文介绍了配置更改后，MySQL数据库中的UTF-8字符串被弄乱了的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个MySQL，它的字符串处于休眠状态.现在，我再次拾起它，发现所有特殊字符都被弄乱了.我的ISP已将服务器移植到另一台计算机上，我怀疑可能是在它发生的时候.

I have a MySQL with strings that I left dormant for a while. Now that I picked it up again, I noticed that all the special characters are screwed up. My ISP has ported the server to a different machine, I suspect that this might be when it happened.

该数据库由PHP脚本填充.一切都应该放在UTF-8中，这就是数据库的设置.

The database was populated by a PHP script. Everything was supposed to be in UTF-8, that's what the database is set to.

但是，这就是现在的字符串:

However, this is what a string looks like now:

fÃƒÂªte

这四个特殊字符应该是一个字符，ê，字符串应该是fête.

Those four special characters are supposed to be one character, ê, the string is meant to be fête.

现在看来这只是重新编码了两次，但这似乎不正确.十六进制的四个字符是:

Now it looks like this is just re-encoded twice, but that doesn't seem right. Those four characters in hex are:

C3 83 C6 92 C3 82 C2 AA

这看起来非常像UTF-8，因此，如果我们对其进行解码，则会得到

This looks very much like UTF-8, so if we decode it, we get

C3 3F C2 AA

这不是UTF-8(由于3F)，但让我们再次对其进行解码:

This isn't quite UTF-8 (because of the 3F), but let's decode it again:

FF AA

这不是UTF-8.

在UTF-8中，ê字符为EA，即为C3 AA.

The ê character is EA, in UTF-8, that would be C3 AA.

另一个例子:西班牙的倒置问号(¿)以C8 83 E2 80 9A C3 82 C2的形式出现，它会解码为C3 3F 82 BF，这又不是正确的UTF-8(转换为FF 82 BF). ¿的预期字符是BF，即正确的UTF-8中的C2 BF.

Another example: The Spanish upside-down question mark (¿) is there as C8 83 E2 80 9A C3 82 C2, which decodes to C3 3F 82 BF, which isn't proper UTF-8 again (translates to FF 82 BF). The expected character for ¿ is BF, i.e. C2 BF in proper UTF-8.

这里发生了什么?角色是如何弄乱的?更重要的是，我该如何解决?

What happened here? How did the characters get messed up? More importantly, how do I fix it?

(附带说明-新服务器要求我写mysql_set_charset("utf8");，否则字符串也被弄乱了，尽管采用的是"UTF-8 as latin1"形式，而不是如上所述的怪异形式.)

(Side note - the new server requires me to write mysql_set_charset("utf8"); or else strings get messed up too, although in the "UTF-8 as latin1" fashion, not in this weird fashion as seen above.)

TL; DR:

通过PHP脚本以UTF-8填充MySQL数据库
处于休眠状态多年，服务器已迁移.
现在字符混乱了，请参见上文.

推荐答案

C3 83 C6 92 C3 82 C2 AA

这看起来非常像UTF-8，因此，如果我们对其进行解码，则会得到

This looks very much like UTF-8, so if we decode it, we get

C3 3F C2 AA

这就是将字节序列视为UTF-8，然后将其编码为ISO-8859-1的结果. 3F是?，已作为替换字符包括在内，因为UTF-8 C6 92是U + 0192 ƒ，在ISO-8859-1中不存在.但是它确实存在于西欧的Windows代码页1252中，其编码与ISO-8859-1非常相似；那里是字节0x83.

That's what you get if you treat the sequence of bytes as UTF-8, then encode it as ISO-8859-1. 3F is ?, which has been included as a replacement character, because UTF-8 C6 92 is U+0192 ƒ which does not exist in ISO-8859-1. But it does exist in Windows code page 1252 Western European, an encoding very similar to ISO-8859-1; there, it's byte 0x83.

C3 83 C2 AA

经历另一轮视为UTF-8字节并将其编码为cp1252的操作，您将得到:

Go through another round of treat-as-UTF-8-bytes-and-encode-to-cp1252 and you get:

C3 AA

最后是ê的UTF-8.

请注意，即使您提供的非XML HTML页面显式为ISO-8859-1，由于令人讨厌的历史原因，浏览器实际上仍将使用cp1252编码.

Note that even if you serve a non-XML HTML page explicitly as ISO-8859-1, browsers will actually use the cp1252 encoding, due to nasty historical reasons.

不幸的是，MySQL没有cp1252编码. latin1是(正确地)ISO-8859-1.因此，您将无法通过转储为latin1然后重新加载为utf8来修复数据(两次).您必须使用可以另存为的文本编辑器来处理脚本(例如在Python file(path, 'rb').read().decode('utf-8').encode('cp1252').decode('utf-8').encode('cp1252')中).

Unfortunately MySQL doesn't have a cp1252 encoding; latin1 is (correctly) ISO-8859-1. So you won't be able to fix up the data by dumping as latin1 then reloading as utf8 (twice). You'd have to process the script with a text editor that can save as either (or eg in Python file(path, 'rb').read().decode('utf-8').encode('cp1252').decode('utf-8').encode('cp1252')).

这篇关于配置更改后，MySQL数据库中的UTF-8字符串被弄乱了的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

配置更改后，MySQL数据库中的UTF-8字符串被弄乱了 [英] UTF-8 strings in a MySQL database got messed up after configuration change

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

配置更改后，MySQL数据库中的UTF-8字符串被弄乱了 [英] UTF-8 strings in a MySQL database got messed up after configuration change

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭