在PHP和MySQL中使用utf8mb4 [英] Using utf8mb4 with php and mysql
问题描述
I have read that mysql >= 5.5.3 fully supports every possible character if you USE the encoding utf8mb4 for a certain table/column http://mathiasbynens.be/notes/mysql-utf8mb4
看起来不错.只有我注意到php中的mb_functions没有!我无法在列表中的任何位置找到它: http://php.net/manual/zh/mbstring.supported-encodings.php
looks nice. Only I noticed that the mb_functions in php does not! I cannot find it anywhere in the list: http://php.net/manual/en/mbstring.supported-encodings.php
我不仅阅读了东西,还进行了测试.
Not only have I read things but I also made a test.
我已使用php脚本将数据添加到mysql utf8mb4表中,其中内部编码设置为UTF-8:mb_internal_encoding("UTF-8");
I have added data to a mysql utf8mb4 table using a php script where the internal encoding was set to UTF-8: mb_internal_encoding("UTF-8");
而且,正如预期的那样,这些字符在数据库中看起来一团糟.
and, as expected, the characters looks messy once in the db.
有什么想法可以使php和mysql谈论相同的编码(可能是4字节),并且仍然对任何世界语言都具有完全支持吗?
Any idea how I can make php and mysql talk the same encoding (possibly a 4 bytes one) and still have FULL support to any world language?
为什么utf8mb4与utf32不同?
Also why is utf8mb4 different from utf32?
推荐答案
MySQL的utf8
编码是 not 实际的UTF-8.这种编码有点像UTF-8,但仅支持UTF-8支持的子集. utf8mb4
是 actual UTF-8.这种差异是MySQL的内部实现细节.两者在PHP方面看起来都像UTF-8.无论您使用utf8
还是utf8mb4
,PHP在两种情况下都将获得有效的UTF-8.
MySQL's utf8
encoding is not actual UTF-8. It's an encoding that is kinda like UTF-8, but only supports a subset of what UTF-8 supports. utf8mb4
is actual UTF-8. This difference is an internal implementation detail of MySQL. Both look like UTF-8 on the PHP side. Whether you use utf8
or utf8mb4
, PHP will get valid UTF-8 in both cases.
您需要确保将PHP和MySQL之间的连接编码设置为utf8mb4
.如果将其设置为utf8
,则MySQL将不支持所有字符.您可以使用mysql_set_charset()
,PDO charset
DSN连接参数或适合您所选择的数据库API的任何其他方法来设置此连接编码.
What you need to make sure is that the connection encoding between PHP and MySQL is set to utf8mb4
. If it's set to utf8
, MySQL will not support all characters. You set this connection encoding using mysql_set_charset()
, the PDO charset
DSN connection parameter or whatever other method is appropriate for your database API of choice.
mb_internal_encoding
只是为所有mb_*
函数具有的$encoding
参数设置默认值.它与MySQL无关.
mb_internal_encoding
just sets the default value for the $encoding
parameter all mb_*
functions have. It has nothing to do with MySQL.
UTF-8和UTF-32在编码字符方面有所不同. UTF-8为字符使用的 minimum 至少为1个字节,最大为4.UTF-32始终的每个字符均使用4个字节. UTF-16最少使用2个字节,最多使用4个字节.
由于长度可变,UTF-8的开销很小.可以在UTF-16中以2个字节编码的字符在UTF-8中可以是3或4;在UTF-8中可以是3.另一方面,UTF-16永远不会使用少于2个字节的小于.如果您要存储大量亚洲文字,则UTF-16可能会使用较少的存储空间.如果您的大多数文本是英语/ASCII,则UTF-8使用较少的存储空间. UTF-32始终使用最多的存储空间.
UTF-8 and UTF-32 differ in how they encode characters. UTF-8 uses a minimum of 1 byte for a character and a maximum of 4. UTF-32 always uses 4 bytes for every character. UTF-16 uses a minimum of 2 bytes and a maximum of 4.
Due to its variable length, UTF-8 has a little bit of overhead. A character which can be encoded in 2 bytes in UTF-16 may take 3 or 4 in UTF-8; on the other hand, UTF-16 never uses less than 2 bytes. If you're storing lots of Asian text, UTF-16 may use less storage. If most of your text is English/ASCII, UTF-8 uses less storage. UTF-32 always uses the most storage.
这篇关于在PHP和MySQL中使用utf8mb4的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!