json解析器和编码器应如何处理转义的unicode? [英] How should escaped unicode be handled by json parsers and encoders?

查看:153
本文介绍了json解析器和编码器应如何处理转义的unicode?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

json规范允许以json字符串(格式为\ uXXXX)转义的unicode.它特别提到了受限代码点(非字符)作为有效的转义代码点.难道这不意味着解析器应该从包含非字符和受限代码点的字符串中生成非法的unicode?

一个例子:

{ "key": "\uFDD0" }

解码这要么要求您的解析器不尝试解释转义的代码点,要么生成无效的unicode字符串.不是吗?

解决方案

解码时,看来这对于 Unicode字符数据库:

  • 用于替换值未知或无法在Unicode中表示的传入字符
  • 比较使用U + 001A作为控制字符来指示替代功能

The json spec allows for escaped unicode in json strings (of the form \uXXXX). It specifically mentions a restricted codepoint (a noncharacter) as a valid escaped codepoint. Doesn't this imply parsers should generate illegal unicode from strings containing noncharacters and restricted codepoints?

An example:

{ "key": "\uFDD0" }

decoding this either requires your parser makes no attempt to interpret the escaped codepoint or that it generates an invalid unicode string. does it not?

解决方案

When you decode, it seems that this would be an appropriate use for the unicode replacement character, U+FFFD.

From the Unicode Character Database:

  • used to replace an incoming character whose value is unknown or unrepresentable in Unicode
  • compare the use of U+001A as a control character to indicate the substitute function

这篇关于json解析器和编码器应如何处理转义的unicode?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆