用Erlang编码表情符号 [英] Encoding emoji in Erlang
问题描述
假设我有一个二进制
Message = <<"string containing emoji">>.
如何正确地以Unicode编码?我尝试这样做:
How do I properly encode it in Unicode? I tried doing:
Encoded = <<Message/utf16>>.
编译文件时收到此警告:
I get this warning when compiling the file:
警告:二进制构建将失败,并带有'badarg'异常
(utf8 / utf16 / utf32段中的无效Unicode代码点)
Warning: binary construction will fail with a 'badarg' exception (invalid Unicode code point in a utf8/utf16/utf32 segment)
我也尝试了/ utf8。同样的警告。
I tried this with /utf8 as well. Same warning.
推荐答案
假设您开始使用的二进制文件是根据UTF-8编码的,则需要将其编码为小尾数UTF-16,这应该可以工作:
Assuming that the binary you start with is encoded according to UTF-8, and you need to encode it as little-endian UTF-16, this should work:
unicode:characters_to_binary(<<"string containing emoji">>, utf8, {utf16, little})
请参见 Unicode模块的文档以获取更多信息。
See the documentation for the Unicode module for more information.
<<消息/ utf16>>
失败的原因是 utf8
,位语法中的utf16
和 utf32
说明符编码单个代码点,而不是整个字符串。因此,要对字符 U + 1F64C
进行编码,可以使用:
The reason why <<Message/utf16>>
fails is that the utf8
, utf16
and utf32
specifiers in bit syntax encode a single codepoint, not an entire string. So to encode the character U+1F64C
, you could use:
2> <<16#1f64c/utf8>>.
<<240,159,153,140>>
3> <<16#1f64c/utf16>>.
<<"\330=\336L">>
4> <<16#1f64c/utf32>>.
<<0,1,246,76>>
这篇关于用Erlang编码表情符号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!