JSONDecoder如何知道要使用哪种编码? [英] How does `JSONDecoder` know which encoding to use?

查看:209
本文介绍了JSONDecoder如何知道要使用哪种编码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

已阅读 dataDecodingStrategy 这就是发生编码猜测魔术的地方...?

Having read Joel on Encoding like a good boy, I find myself perplexed by the workings of Foundation's JSONDecoder, neither of whose init or decode methods take an encoding value. Looking through the docs, I see the instance variable dataDecodingStrategy, which perhaps this is where the encoding-guessing magic happens...?

我在这里错过了什么吗? JSONDecoder是否不应该知道接收到的数据的编码?我意识到JSON标准要求此数据必须以UTF-8编码,但是JSONDecoder可以做出这种假设吗?我很困惑.

Am I missing something here? Shouldn't JSONDecoder need to know the encoding of the data it receives? I realize that the JSON standard requires this data to be UTF-8 encoded, but can JSONDecoder be making that assumption? I'm confused.

推荐答案

RFC 8259 (自2017年起)

在不属于封闭生态系统的系统之间交换的JSON文本必须使用UTF-8进行编码.

JSON text exchanged between systems that are not part of a closed ecosystem MUST be encoded using UTF-8.

较旧的 RFC 7159 (从2013年开始)和

The older RFC 7159 (from 2013) and RFC 7158 (from 2013) only stated that

JSON文本应以UTF-8,UTF-16或UTF-32编码.默认值 编码为UTF-8,使用UTF-8编码的JSON文本为 从某种意义上讲,它们可以互操作, 最大实施次数;有很多实现 无法成功读取其他编码(例如, UTF-16和UTF-32).

JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32. The default encoding is UTF-8, and JSON texts that are encoded in UTF-8 are interoperable in the sense that they will be read successfully by the maximum number of implementations; there are many implementations that cannot successfully read texts in other encodings (such as UTF-16 and UTF-32).

RFC 4627 (从2006年开始,是我能找到的最古老的版本):

And RFC 4627 (from 2006, the oldest one that I could find):

JSON文本应以Unicode编码.默认编码是 UTF-8.

JSON text SHALL be encoded in Unicode. The default encoding is UTF-8.

由于JSON文本的前两个字符始终为ASCII 字符,可以确定是否为八位字节 通过查看,流是UTF-8,UTF-16(BE或LE)或UTF-32(BE或LE) 在前四个八位位组中为空模式.

Since the first two characters of a JSON text will always be ASCII characters, it is possible to determine whether an octet stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking at the pattern of nulls in the first four octets.

JSONDecoder(在后台使用JSONSerialization)能够解码Little-endian和Big-endian的UTF-8,UTF-16和UTF-32.示例:

JSONDecoder (which uses JSONSerialization under the hood) is able to decode UTF-8, UTF-16, and UTF-32, both little-endian and big-endian. Example:

let data = "[1, 2, 3]".data(using: .utf16LittleEndian)!
print(data as NSData) // <5b003100 2c002000 32002c00 20003300 5d00>

let a = try! JSONDecoder().decode([Int].self, from: data)
print(a) // [1, 2, 3]

由于有效的JSON文本必须以"["或"{"开头,因此可以明确地从数据的第一个字节确定编码.

Since a valid JSON text must start with "[", or "{", the encoding can unambiguously be determined from the first bytes of the data.

尽管我没有找到这份有记载的,但人们可能不应该依赖它. JSONDecoder的将来实现可能仅支持较新的标准,并且需要UTF-8.

I did not find this documented though, and one probably should not rely on it. A future implementation of JSONDecoder might support only the newer standard and require UTF-8.

这篇关于JSONDecoder如何知道要使用哪种编码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆