使用utf8字符时出现问题;我看到的不是我存储的 [英] Trouble with utf8 characters; what I see is not what I stored

查看:261
本文介绍了使用utf8字符时出现问题;我看到的不是我存储的的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图使用utf8并遇到麻烦。



我试过这么多东西;以下是我获得的结果:




  • ???? 字符。即使对于欧洲文本,我得到 Se?或 Señor

  • 奇怪的乱画(Mojibake?)如Señ或æ-°æμªæ-°é - » c $ c>新浪新闻。

  • 黑钻石,如Se or。

  • 数据丢失或至少被截断的情况: Se Señor

  • 即使我有文字正确,也没有正确排序。



我做错了什么?如何修复 代码?

解决方案

这个问题困扰这个问题的参与者论坛,和许多其他。
您列出了 CHARACTER SET 的5个主要情况。



/ strong>



未来,最好使用 CHARACTER SET utf8mb4 COLLATION utf8mb4_unicode_520_ci
(管道中有一个较新版本的unicode排序规则。)
utf8mb4 utf8 <



在MySQL之外,UTF-8是指一个字符串,到所有大小的编码,因此有效地与MySQL的utf8mb4相同,而不是utf8。
我将尝试使用这些拼写和大小写在以下区分内部和外部MySQL。



/ em> do




  • 拥有编辑器等。设置为UTF-8。

  • HTML表单应以< form accept-charset =UTF-8> 开头。

  • 将您的字节编码为UTF-8。

  • 将客户端中使用的编码建立为UTF-8。

  • 让列/表声明 CHARACTER SET utf8mb4 (使用 SHOW CREATE TABLE HTML>


  • < meta charset = UTF-8>
    $ b

    如何支持UTF-8完全在网络应用程序中(以前称为utf8一路通过)



    计算机语言的更多细节(及其以下部分)



    测试数据



    使用工具或 SELECT 查看数据不可信任。
    这样的客户端,尤其是浏览器,太多尝试补偿不正确的编码,并显示正确的文本,即使数据库被破坏。
    所以,选择一个具有一些非英语文本的表和列

      SELECT col,HEX )FROM tbl WHERE ... 

    正确存储的UTF-8的HEX将是




    • 对于空格(任何语言): 20

    • 对于英语: 4x 5x 6x ,或 7x

    • 对于西欧大多数地区,重音字母应为 Cxyy

    • Cyrillic,Hebrew和Farsi / Arabic: Dxyy

    • Exyyzz

    • 表情符号和部分中文: F0yyzzww

    • 更多详情



    遇到的问题的特定原因和修正



    截断 code> Se Señor):




    • 要存储的字节不编码为utf8mb4。



    此外,请检查阅读期间的连接是否为UTF- 黑钻石带有问号(Se orSeñor);
    这些情况之一存在:



    情况1(原始字节不是 UTF-8):




    • 要存储的字节不编码为utf8。修正此问题。 c $>
    • INSERT 的连接(或 SET NAMES SELECT 不是utf8 / utf8mb4。

    • 此外,请检查数据库中的列是否 CHARACTER SET utf8 (或utf8mb4)。



    案例2(原始字节 UTF-8):




    • SELECT 的连接(或 SET NAMES )不是utf8 / utf8mb4。

    • 此外,请检查数据库中的列是否 CHARACTER SET utf8 (或utf8mb4)。



    黑色菱形只有在浏览器设置为< meta charset = UTF-8>



    问号(常规,而不是黑色钻石)( Se?或 Señor):




    • 要存储的字节不编码为utf8 / utf8mb4。修正此问题。

    • 数据库中的列为 CHARACTER SET utf8 (或utf8mb4)。



    此外,请检查阅读期间的连接是否为UTF- Mojibake Señ或 Señor):
    适用于双重编码,不一定可见。)




    • 要存储的字节必须是UTF -8编码。修复此问题。

    • INSERTing SELECTing 文本需要指定utf8或utf8mb4。修正此问题。

    • 该列需要声明 CHARACTER SET utf8 (或utf8mb4)。修正此问题。

    • HTML应以< meta charset = UTF-8> 开头。



    如果数据看起来正确,但不会正确排序,那么
    或者您选择了错误的排序规则,
    或没有排序规则符合您的需要,
    或您有双重编码



    双重编码 SELECT .. HEX .. 如上所述。

     回到C3A9,而是显示C383C2A9 
    表情符号

    I tried to use utf8 and ran into trouble.

    I have tried so many things; here are the results I have gotten:

    • ???? instead of Asian characters. Even for European text, I got Se?or for Señor.
    • Strange gibberish (Mojibake?) such as Señor or 新浪新闻 for 新浪新闻.
    • Black diamonds, such as Se�or.
    • Finally, I got into a situation where the data was lost, or at least truncated: Se for Señor.
    • Even when I got text to look right, it did not sort correctly.

    What am I doing wrong? How can I fix the code? Can I recover the data, if so, how?

    解决方案

    This problem plagues the participants of this forum, and many others. You have listed the 5 main cases of CHARACTER SET troubles.

    Best Practice

    Going forward, it is best to use CHARACTER SET utf8mb4 and COLLATION utf8mb4_unicode_520_ci. (There is a newer version of the unicode collation in the pipeline.) utf8mb4 is a superset of utf8 in that it handles 4-byte utf8 codes, which are needed by Emoji and some of Chinese.

    Outside of MySQL, "UTF-8" refers to all size encodings, hence effectively the same as MySQL's utf8mb4, not utf8. I will try to use those spellings and capitalizations to distinguish inside versus outside MySQL in the following.

    Overview of what you should do

    • Have your editor/etc. set to UTF-8.
    • HTML forms should start like <form accept-charset="UTF-8">.
    • Have your bytes encoded as UTF-8.
    • Establish UTF-8 as the encoding being used in the client.
    • Have the column/table declared CHARACTER SET utf8mb4 (Check with SHOW CREATE TABLE.)
    • <meta charset=UTF-8> at the beginning of HTML

    How to support UTF-8 completely in a web application (Formerly called "utf8 all the way through")

    More details for computer languages (and its following sections)

    Test the data

    Viewing the data with a tool or with SELECT cannot be trusted. Too many such clients, especially browsers, try to compensate for incorrect encodings, and show you correct text even if the database is mangled. So, pick a table and column that has some non-English text and do

    SELECT col, HEX(col) FROM tbl WHERE ...
    

    The HEX for correctly stored UTF-8 will be

    • For a blank space (in any language): 20
    • For English: 4x, 5x, 6x, or 7x
    • For most of Western Europe, accented letters should be Cxyy
    • Cyrillic, Hebrew, and Farsi/Arabic: Dxyy
    • Most of Asia: Exyyzz
    • Emoji and some of Chinese: F0yyzzww
    • More details

    Specific causes and fixes of the problems seen

    Truncated text (Se for Señor):

    • The bytes to be stored are not encoded as utf8mb4. Fix this.
    • Also, check that the connection during reading is UTF-8.

    Black Diamonds with question marks (Se�or for Señor); one of these cases exists:

    Case 1 (original bytes were not UTF-8):

    • The bytes to be stored are not encoded as utf8. Fix this.
    • The connection (or SET NAMES) for the INSERT and the SELECT was not utf8/utf8mb4. Fix this.
    • Also, check that the column in the database is CHARACTER SET utf8 (or utf8mb4).

    Case 2 (original bytes were UTF-8):

    • The connection (or SET NAMES) for the SELECT was not utf8/utf8mb4. Fix this.
    • Also, check that the column in the database is CHARACTER SET utf8 (or utf8mb4).

    Black diamonds occur only when the browser is set to <meta charset=UTF-8>.

    Question Marks (regular ones, not black diamonds) (Se?or for Señor):

    • The bytes to be stored are not encoded as utf8/utf8mb4. Fix this.
    • The column in the database is CHARACTER SET utf8 (or utf8mb4). Fix this.
    • Also, check that the connection during reading is UTF-8.

    Mojibake (Señor for Señor): (This discussion also applies to Double Encoding, which is not necessarily visible.)

    • The bytes to be stored need to be UTF-8-encoded. Fix this.
    • The connection when INSERTing and SELECTing text needs to specify utf8 or utf8mb4. Fix this.
    • The column needs to be declared CHARACTER SET utf8 (or utf8mb4). Fix this.
    • HTML should start with <meta charset=UTF-8>.

    If the data looks correct, but won't sort correctly, then either you have picked the wrong collation, or there is no collation that suits your need, or you have Double Encoding.

    Double Encoding can be confirmed by doing the SELECT .. HEX .. described above.

    é should come back C3A9, but instead shows C383C2A9
    The Emoji 
                            

    这篇关于使用utf8字符时出现问题;我看到的不是我存储的的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆