最终的表情符号编码方案 [英] The ultimate emoji encoding scheme
问题描述
这是我的环境:客户端-> iOS应用程序,服务器-> PHP和MySQL.
This is my environment: Client -> iOS App, Server ->PHP and MySQL.
从客户端到服务器的数据是通过HTTP POST完成的.
The data from client to server is done via HTTP POST.
从服务器到客户端的数据是使用json完成的.
The data from server to client is done with json.
我想增加对表情符号或任何utf8mb4字符的支持.我正在寻找在我的情况下处理此问题的正确方法.
I would like to add support for emojis or any utf8mb4 character in general. I'm looking for the right way for dealing with this under my scenario.
我的问题如下:
-
POST是否允许utf8mb4,还是应该将客户端中的数据转换为纯utf8?
Does POST allow utf8mb4, or should I convert the data in the client to plain utf8?
如果我的数据库具有排序规则和字符集utf8mb4,这是否意味着我应该能够存储原始"表情符号?
If my DB has collation and character set utf8mb4, does it mean I should be able to store 'raw' emojis?
我应该尝试使用utf8mb4在数据库中工作还是在utf8中工作和对符号进行编码更安全/更好/更受支持?如果是这样,我应该使用哪种编码方法,以便它在Objective-C和PHP(以及未来的android版本的Java)中都可以正常使用?
Should I try to work in the DB with utf8mb4 or is it safer/better/more supported to work in utf8 and encode symbols? If so, which encoding method should I use so that it works flawlessly in Objective-C and PHP (and java for the future android version)?
现在,我拥有带utf8mb4的数据库,但是在尝试存储原始表情符号时出现错误.另一方面,我可以存储诸如¿
或á
之类的非utf8符号.
Right now I have the DB with utf8mb4 but I get errors when trying to store a raw emoji. On the other hand, I can store non-utf8 symbols such ¿
or á
.
当我在PHP中检索此符号时,我首先需要执行SET CHARACTER SET utf8
(如果我在utf8mb4中获得它们,则json_decode
函数将不起作用),则将这些符号编码(例如,将¿
编码为\u00bf
).
When I retrieve this symbols in PHP I first need to execute SET CHARACTER SET utf8
(if I get them in utf8mb4 the json_decode
function doesn't work), then such symbols are encoded (e.g., ¿
is encoded to \u00bf
).
推荐答案
MySQL的utf8
字符集实际上不是UTF-8 ,它是UTF-8的子集,仅支持基本平面(字符直至U + FFFF).大多数表情符号使用的代码点高于U + FFFF. MySQL的utf8mb4
是实际UTF-8 ,它可以对所有这些代码点进行编码.在MySQL之外,没有"utf8mb4"之类的东西,只有UTF-8.所以:
MySQL's utf8
charset is not actually UTF-8, it's a subset of UTF-8 only supporting the basic plane (characters up to U+FFFF). Most emoji use code points higher than U+FFFF. MySQL's utf8mb4
is actual UTF-8 which can encode all those code points. Outside of MySQL there's no such thing as "utf8mb4", there's just UTF-8. So:
POST是否允许utf8mb4,还是应该将客户端中的数据转换为纯utf8?
Does POST allow utf8mb4, or should I convert the data in the client to plain utf8?
同样,没有"utf8mb4"之类的东西.如果您的客户端发送UTF-8编码的数据,那么HTTP POST请求支持任何原始字节.
Again, no such thing as "utf8mb4". HTTP POST requests support any raw bytes, if your client sends UTF-8 encoded data you're fine.
如果我的数据库具有排序规则和字符集utf8mb4,这是否意味着我应该能够存储原始"表情符号?
If my DB has collation and character set utf8mb4, does it mean I should be able to store 'raw' emojis?
是的
我应该尝试使用utf8mb4在数据库中工作还是在utf8中工作和对符号进行编码更安全/更好/更受支持?
Should I try to work in the DB with utf8mb4 or is it safer/better/more supported to work in utf8 and encode symbols?
上帝不行,对于所有神圣的事物,请使用原始的UTF-8(utf8mb4
).
God no, use raw UTF-8 (utf8mb4
) for all that is holy.
当我在PHP中检索此符号时,我首先需要执行
SET CHARACTER SET utf8
好吧,这是您的问题;通过MySQL的utf8
字符集引导数据将丢弃U + FFFF之上的任何字符.在整个MySQL中始终使用utf8mb4
.
Well, there's your problem; channeling your data through MySQL's utf8
charset will discard any characters above U+FFFF. Use utf8mb4
all the way through MySQL.
如果我在utf8mb4中获取它们,则json_decode函数不起作用
if I get them in utf8mb4 the json_decode function doesn't work
您必须指定确切的含义.只要它是有效的UTF-8,PHP的JSON函数就应该能够处理任何Unicode代码点:
You'll have to specify what that means exactly. PHP's JSON functions should be able to handle any Unicode code point just fine, as long as it's valid UTF-8:
这篇关于最终的表情符号编码方案的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!