JSON 字符编码 - 浏览器是否支持 UTF-8,还是应该使用数字转义序列? [英] JSON character encoding - is UTF-8 well-supported by browsers or should I use numeric escape sequences?

查看:22
本文介绍了JSON 字符编码 - 浏览器是否支持 UTF-8,还是应该使用数字转义序列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个使用 json 表示其资源的 web 服务,我在思考编码 json 的最佳方法时有点卡住了.读取 json rfc (http://www.ietf.org/rfc/rfc4627.txt) 很明显,首选编码是 utf-8.但是 rfc 还描述了用于指定字符的字符串转义机制.我认为这通常用于转义非 ascii 字符,从而使生成的 utf-8 成为有效的 ascii.

I am writing a webservice that uses json to represent its resources, and I am a bit stuck thinking about the best way to encode the json. Reading the json rfc (http://www.ietf.org/rfc/rfc4627.txt) it is clear that the preferred encoding is utf-8. But the rfc also describes a string escaping mechanism for specifying characters. I assume this would generally be used to escape non-ascii characters, thereby making the resulting utf-8 valid ascii.

假设我有一个 json 字符串,其中包含非 ascii 的 unicode 字符(代码点).我的网络服务应该只是 utf-8 编码并返回它,还是应该转义所有那些非 ascii 字符并返回纯 ascii?

So let's say I have a json string that contains unicode characters (code-points) that are non-ascii. Should my webservice just utf-8 encoding that and return it, or should it escape all those non-ascii characters and return pure ascii?

我希望浏览器能够使用 jsonp 或 eval 执行结果.这会影响决定吗?我对各种浏览器对 utf-8 的 javascript 支持缺乏了解.

I'd like browsers to be able to execute the results using jsonp or eval. Does that effect the decision? My knowledge of various browser's javascript support for utf-8 is lacking.

我想澄清一下,我对如何编码结果的主要关注实际上是关于浏览器对结果的处理.我所读到的内容表明,特别是在使用 JSONP 时,浏览器可能对编码敏感.我还没有找到关于这个主题的任何真正好的信息,所以我必须开始做一些测试,看看会发生什么.理想情况下,我只想转义那些需要的几个字符,然后只对结果进行 utf-8 编码.

I wanted to clarify that my main concern about how to encode the results is really about browser handling of the results. What I've read indicates that browsers may be sensitive to the encoding when using JSONP in particular. I haven't found any really good info on the subject, so I'll have to start doing some testing to see what happens. Ideally I'd like to only escape those few characters that are required and just utf-8 encode the results.

推荐答案

JSON 规范要求解码器支持 UTF-8.因此,所有 JSON 解码器都可以像处理数字转义序列一样处理 UTF-8.Javascript 解释器也是如此,这意味着 JSONP 也将处理 UTF-8 编码的 JSON.

The JSON spec requires UTF-8 support by decoders. As a result, all JSON decoders can handle UTF-8 just as well as they can handle the numeric escape sequences. This is also the case for Javascript interpreters, which means JSONP will handle the UTF-8 encoded JSON as well.

JSON 编码器使用数字转义序列的能力只是为您提供了更多选择.您可以选择数字转义序列的一个原因是,如果在您的编码器和预期解码器之间的传输机制不是二进制安全的.

The ability for JSON encoders to use the numeric escape sequences instead just offers you more choice. One reason you may choose the numeric escape sequences would be if a transport mechanism in between your encoder and the intended decoder is not binary-safe.

您可能想要使用数字转义序列的另一个原因是为了防止某些字符出现在流中,例如 <&",如果 JSON 代码放置时没有转义为 HTML,或者浏览器错误地将其解释为 HTML,则它可能会被解释为 HTML 序列.这可以防御 HTML 注入或跨站点脚本(注意:必须在 JSON 中对某些字符进行转义,包括 ").

Another reason you may want to use numeric escape sequences is to prevent certain characters appearing in the stream, such as <, & and ", which may be interpreted as HTML sequences if the JSON code is placed without escaping into HTML or a browser wrongly interprets it as HTML. This can be a defence against HTML injection or cross-site scripting (note: some characters MUST be escaped in JSON, including " and ).

一些框架,包括 PHP 的 json_encode()(默认情况下),总是在编码器端为 ASCII 之外的任何字符执行数字转义序列.这是一个几乎不必要的额外步骤,旨在最大程度地与有限的传输机制等兼容.但是,这不应被解释为任何 JSON 解码器都存在 UTF-8 问题的迹象.

Some frameworks, including PHP's json_encode() (by default), always do the numeric escape sequences on the encoder side for any character outside of ASCII. This is a mostly unnecessary extra step intended for maximum compatibility with limited transport mechanisms and the like. However, this should not be interpreted as an indication that any JSON decoders have a problem with UTF-8.

所以,我想你可以像这样决定使用哪个:

So, I guess you just could decide which to use like this:

  • 只需使用 UTF-8,除非您用于在编码器和解码器之间进行存储或传输的任何软件都不是二进制安全的.

  • Just use UTF-8, unless any software you are using for storage or transport between encoder and decoder isn't binary-safe.

否则,请使用数字转义序列.

Otherwise, use the numeric escape sequences.

这篇关于JSON 字符编码 - 浏览器是否支持 UTF-8,还是应该使用数字转义序列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆