参考:为什么我的“特殊"帐户会被删除?使用json_encode奇怪编码的Unicode字符? [英] Reference: Why are my "special" Unicode characters encoded weird using json_encode?

查看:82
本文介绍了参考:为什么我的“特殊"帐户会被删除?使用json_encode奇怪编码的Unicode字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用特殊" Unicode字符时,当编码为JSON时,它们会作为奇怪的垃圾出现:

When using "special" Unicode characters they come out as weird garbage when encoded to JSON:

php > echo json_encode(['foo' => '馬']);
{"foo":"\u99ac"}

为什么?我的编码有问题吗?

Why? Have I done something wrong with my encodings?

(这是一劳永逸地澄清该主题的参考问题,因为它一遍又一遍地出现.)

推荐答案

首先:这里没什么问题.这就是字符 用JSON编码的方式.在官方 ),其描述如下:

First of all: There's nothing wrong here. This is how characters can be encoded in JSON. It is in the official standard. It is based on how string literals can be formed in Javascript ECMAScript (section 7.8.4 "String Literals") and is described as such:

任何代码点都可以表示为十六进制数.此数字的含义由ISO/IEC 10646确定.如果代码点位于基本多语言平面(U + 0000至U + FFFF)中,则可以将其表示为六个字符的序列:反向固线,后跟小写字母u,然后是对代码点进行编码的四个十六进制数字. [...]因此,例如,仅包含单个反斜线字符的字符串可以表示为"\ u005C".

Any code point may be represented as a hexadecimal number. The meaning of such a number is determined by ISO/IEC 10646. If the code point is in the Basic Multilingual Plane (U+0000 through U+FFFF), then it may be represented as a six-character sequence: a reverse solidus, followed by the lowercase letter u, followed by four hexadecimal digits that encode the code point. [...] So, for example, a string containing only a single reverse solidus character may be represented as "\u005C".

简而言之:任何字符都可以编码为\u....,其中....是字符的Unicode代码点(对于BMP之外的字符,则是UTF-16代理对的一半的代码点).

In short: Any character can be encoded as \u...., where .... is the Unicode code point of the character (or the code point of half of a UTF-16 surrogate pair, for characters outside the BMP).

"馬"
"\u99ac"

这两个字符串文字代表完全相同的字符,它们是绝对等价的.当这些字符串文字由兼容的JSON解析器解析时,它们都将产生字符串马".它们看起来不一样,但是在JSON数据编码格式中它们表示相同.

These two string literals represent the exact same character, they're absolutely equivalent. When these string literals are parsed by a compliant JSON parser, they will both result in the string "馬". They don't look the same, but they mean the same thing in the JSON data encoding format.

PHP的 json_encode 最好使用\u....转义序列对非ASCII字符进行编码.从技术上讲,它不是必须的,但确实如此.结果是完全正确的.如果您希望在JSON中使用文字字符而不是转义序列,则可以在PHP 5.4或更高版本中设置JSON_UNESCAPED_UNICODE标志:

PHP's json_encode preferably encodes non-ASCII characters using \u.... escape sequences. Technically it doesn't have to, but it does. And the result is perfectly valid. If you prefer to have literal characters in your JSON instead of escape sequences, you can set the JSON_UNESCAPED_UNICODE flag in PHP 5.4 or higher:

php > echo json_encode(['foo' => '馬'], JSON_UNESCAPED_UNICODE);
{"foo":"馬"}

要强调一点:这只是一个首选项,以任何方式都不需要在JSON中传输"Unicode字符".

To emphasise: this is just a preference, it is not necessary in any way to transport "Unicode characters" in JSON.

这篇关于参考:为什么我的“特殊"帐户会被删除?使用json_encode奇怪编码的Unicode字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆