在PHP和MySQL中使用utf8mb4 [英] Using utf8mb4 with php and mysql

查看:158
本文介绍了在PHP和MySQL中使用utf8mb4的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经了解到,如果您对某个表/列

I have read that mysql >= 5.5.3 fully supports every possible character if you USE the encoding utf8mb4 for a certain table/column http://mathiasbynens.be/notes/mysql-utf8mb4

看起来不错.只有我注意到php中的mb_functions没有!我无法在列表中的任何位置找到它: http://php.net/manual/zh/mbstring.supported-encodings.php

looks nice. Only I noticed that the mb_functions in php does not! I cannot find it anywhere in the list: http://php.net/manual/en/mbstring.supported-encodings.php

我不仅阅读了东西,还进行了测试.

Not only have I read things but I also made a test.

我已使用php脚本将数据添加到mysql utf8mb4表中,其中内部编码设置为UTF-8:mb_internal_encoding("UTF-8");

I have added data to a mysql utf8mb4 table using a php script where the internal encoding was set to UTF-8: mb_internal_encoding("UTF-8");

而且,正如预期的那样,这些字符在数据库中看起来一团糟.

and, as expected, the characters looks messy once in the db.

有什么想法可以使php和mysql谈论相同的编码(可能是4字节),并且仍然对任何世界语言都具有完全支持吗?

Any idea how I can make php and mysql talk the same encoding (possibly a 4 bytes one) and still have FULL support to any world language?

为什么utf8mb4与utf32不同?

Also why is utf8mb4 different from utf32?

推荐答案

MySQL的utf8编码是 not 实际的UTF-8.这种编码有点像UTF-8,但仅支持UTF-8支持的子集. utf8mb4 actual UTF-8.这种差异是MySQL的内部实现细节.两者在PHP方面看起来都像UTF-8.无论您使用utf8还是utf8mb4,PHP在两种情况下都将获得有效的UTF-8.

MySQL's utf8 encoding is not actual UTF-8. It's an encoding that is kinda like UTF-8, but only supports a subset of what UTF-8 supports. utf8mb4 is actual UTF-8. This difference is an internal implementation detail of MySQL. Both look like UTF-8 on the PHP side. Whether you use utf8 or utf8mb4, PHP will get valid UTF-8 in both cases.

您需要确保将PHP和MySQL之间的连接编码设置为utf8mb4.如果将其设置为utf8,则MySQL将不支持所有字符.您可以使用mysql_set_charset(),PDO charset DSN连接参数或适合您所选择的数据库API的任何其他方法来设置此连接编码.

What you need to make sure is that the connection encoding between PHP and MySQL is set to utf8mb4. If it's set to utf8, MySQL will not support all characters. You set this connection encoding using mysql_set_charset(), the PDO charset DSN connection parameter or whatever other method is appropriate for your database API of choice.

mb_internal_encoding只是为所有mb_*函数具有的$encoding参数设置默认值.它与MySQL无关.

mb_internal_encoding just sets the default value for the $encoding parameter all mb_* functions have. It has nothing to do with MySQL.

UTF-8和UTF-32在编码字符方面有所不同. UTF-8为字符使用的 minimum 至少为1个字节,最大为4.UTF-32始终的每个字符均使用4个字节. UTF-16最少使用2个字节,最多使用4个字节.
由于长度可变,UTF-8的开销很小.可以在UTF-16中以2个字节编码的字符在UTF-8中可以是3或4;在UTF-8中可以是3.另一方面,UTF-16永远不会使用少于2个字节的小于.如果您要存储大量亚洲文字,则UTF-16可能会使用较少的存储空间.如果您的大多数文本是英语/ASCII,则UTF-8使用较少的存储空间. UTF-32始终使用最多的存储空间.

UTF-8 and UTF-32 differ in how they encode characters. UTF-8 uses a minimum of 1 byte for a character and a maximum of 4. UTF-32 always uses 4 bytes for every character. UTF-16 uses a minimum of 2 bytes and a maximum of 4.
Due to its variable length, UTF-8 has a little bit of overhead. A character which can be encoded in 2 bytes in UTF-16 may take 3 or 4 in UTF-8; on the other hand, UTF-16 never uses less than 2 bytes. If you're storing lots of Asian text, UTF-16 may use less storage. If most of your text is English/ASCII, UTF-8 uses less storage. UTF-32 always uses the most storage.

这篇关于在PHP和MySQL中使用utf8mb4的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆