最终的表情符号编码方案 [英] The ultimate emoji encoding scheme

查看:98
本文介绍了最终的表情符号编码方案的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的环境:客户端-> iOS应用程序,服务器-> PHP和MySQL.

This is my environment: Client -> iOS App, Server ->PHP and MySQL.

从客户端到服务器的数据是通过HTTP POST完成的.

The data from client to server is done via HTTP POST.

从服务器到客户端的数据是使用json完成的.

The data from server to client is done with json.

我想增加对表情符号或任何utf8mb4字符的支持.我正在寻找在我的情况下处理此问题的正确方法.

I would like to add support for emojis or any utf8mb4 character in general. I'm looking for the right way for dealing with this under my scenario.

我的问题如下:

  1. POST是否允许utf8mb4,还是应该将客户端中的数据转换为纯utf8?

  1. Does POST allow utf8mb4, or should I convert the data in the client to plain utf8?

如果我的数据库具有排序规则和字符集utf8mb4,这是否意味着我应该能够存储原始"表情符号?

If my DB has collation and character set utf8mb4, does it mean I should be able to store 'raw' emojis?

我应该尝试使用utf8mb4在数据库中工作还是在utf8中工作和对符号进行编码更安全/更好/更受支持?如果是这样,我应该使用哪种编码方法,以便它在Objective-C和PHP(以及未来的android版本的Java)中都可以正常使用?

Should I try to work in the DB with utf8mb4 or is it safer/better/more supported to work in utf8 and encode symbols? If so, which encoding method should I use so that it works flawlessly in Objective-C and PHP (and java for the future android version)?

现在,我拥有带utf8mb4的数据库,但是在尝试存储原始表情符号时出现错误.另一方面,我可以存储诸如¿á之类的非utf8符号.

Right now I have the DB with utf8mb4 but I get errors when trying to store a raw emoji. On the other hand, I can store non-utf8 symbols such ¿ or á.

当我在PHP中检索此符号时,我首先需要执行SET CHARACTER SET utf8(如果我在utf8mb4中获得它们,则json_decode函数将不起作用),则将这些符号编码(例如,将¿编码为\u00bf).

When I retrieve this symbols in PHP I first need to execute SET CHARACTER SET utf8 (if I get them in utf8mb4 the json_decode function doesn't work), then such symbols are encoded (e.g., ¿ is encoded to \u00bf).

推荐答案

MySQL的utf8字符集实际上不是UTF-8 ,它是UTF-8的子集,仅支持基本平面(字符直至U + FFFF).大多数表情符号使用的代码点高于U + FFFF. MySQL的utf8mb4实际UTF-8 ,它可以对所有这些代码点进行编码.在MySQL之外,没有"utf8mb4"之类的东西,只有UTF-8.所以:

MySQL's utf8 charset is not actually UTF-8, it's a subset of UTF-8 only supporting the basic plane (characters up to U+FFFF). Most emoji use code points higher than U+FFFF. MySQL's utf8mb4 is actual UTF-8 which can encode all those code points. Outside of MySQL there's no such thing as "utf8mb4", there's just UTF-8. So:

POST是否允许utf8mb4,还是应该将客户端中的数据转换为纯utf8?

Does POST allow utf8mb4, or should I convert the data in the client to plain utf8?

同样,没有"utf8mb4"之类的东西.如果您的客户端发送UTF-8编码的数据,那么HTTP POST请求支持任何原始字节.

Again, no such thing as "utf8mb4". HTTP POST requests support any raw bytes, if your client sends UTF-8 encoded data you're fine.

如果我的数据库具有排序规则和字符集utf8mb4,这是否意味着我应该能够存储原始"表情符号?

If my DB has collation and character set utf8mb4, does it mean I should be able to store 'raw' emojis?

是的

我应该尝试使用utf8mb4在数据库中工作还是在utf8中工作和对符号进行编码更安全/更好/更受支持?

Should I try to work in the DB with utf8mb4 or is it safer/better/more supported to work in utf8 and encode symbols?

上帝不行,对于所有神圣的事物,请使用原始的UTF-8(utf8mb4).

God no, use raw UTF-8 (utf8mb4) for all that is holy.

当我在PHP中检索此符号时,我首先需要执行SET CHARACTER SET utf8

好吧,这是您的问题;通过MySQL的utf8字符集引导数据将丢弃U + FFFF之上的任何字符.在整个MySQL中始终使用utf8mb4.

Well, there's your problem; channeling your data through MySQL's utf8 charset will discard any characters above U+FFFF. Use utf8mb4 all the way through MySQL.

如果我在utf8mb4中获取它们,则json_decode函数不起作用

if I get them in utf8mb4 the json_decode function doesn't work

您必须指定确切的含义.只要它是有效的UTF-8,PHP的JSON函数就应该能够处理任何Unicode代码点:

You'll have to specify what that means exactly. PHP's JSON functions should be able to handle any Unicode code point just fine, as long as it's valid UTF-8:

这篇关于最终的表情符号编码方案的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆