JSON字符编码-浏览器是否很好地支持UTF-8,还是应该使用数字转义序列? [英] JSON character encoding - is UTF-8 well-supported by browsers or should I use numeric escape sequences?

查看:99
本文介绍了JSON字符编码-浏览器是否很好地支持UTF-8,还是应该使用数字转义序列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个使用json表示其资源的web服务,并且我还在思考编码json的最佳方法.读取json rfc( http://www.ietf.org/rfc/rfc4627.txt)显然,首选编码是utf-8.但是RFC还描述了用于指定字符的字符串转义机制.我认为通常将其用于转义非ascii字符,从而使生成的utf-8有效ascii.

因此,假设我有一个json字符串,其中包含非ASCII字符(代码点).我的网络服务应该只是对utf-8进行编码并返回它,还是应该转义所有这些非ascii字符并返回纯ascii?

我希望浏览器能够使用jsonp或eval执行结果.这会影响决策吗?我缺乏各种浏览器对utf-8的javascript支持的知识.

我想澄清一下,我对如何对结果进行编码的主要关注点实际上是关于浏览器对结果的处理.我所读的内容表明,特别是在使用JSONP时,浏览器可能对编码敏感.我还没有找到关于该主题的任何非常好的信息,因此我将必须开始进行一些测试以查看会发生什么.理想情况下,我只想转义所需的几个字符,而只是utf-8对结果进行编码.

解决方案

JSON规范需要解码器支持UTF-8.结果,所有JSON解码器都可以处理UTF-8,也可以处理数字转义序列. Java解释器也是如此,这意味着JSONP也将处理UTF-8编码的JSON.

JSON编码器使用数字转义序列的功能只是为您提供了更多选择.选择数字转义序列的原因之一是,如果编码器和预期的解码器之间的传输机制不是二进制安全的.

您可能要使用数字转义序列的另一个原因是防止某些字符出现在流中,例如<&",如果放置了JSON代码,则这些字符可能会解释为HTML序列.而不逃脱为HTML或浏览器错误地将其解释为HTML.这可以抵御HTML注入或跨站点脚本的攻击(注意:某些字符必须以JSON形式转义,包括"\).

某些框架,包括PHP的JSON实现,总是会在编码器端为ASCII以外的任何字符执行数字转义序列.旨在最大程度地与有限的传输机制等兼容.但是,这不应解释为JSON解码器存在UTF-8问题.

所以,我想您可以像这样决定使用哪个:

  • 仅使用UTF-8,除非您在编码器和解码器之间进行存储或传输的方法不是二进制安全的.

  • 否则,请使用数字转义序列.

I am writing a webservice that uses json to represent its resources, and I am a bit stuck thinking about the best way to encode the json. Reading the json rfc (http://www.ietf.org/rfc/rfc4627.txt) it is clear that the preferred encoding is utf-8. But the rfc also describes a string escaping mechanism for specifying characters. I assume this would generally be used to escape non-ascii characters, thereby making the resulting utf-8 valid ascii.

So let's say I have a json string that contains unicode characters (code-points) that are non-ascii. Should my webservice just utf-8 encoding that and return it, or should it escape all those non-ascii characters and return pure ascii?

I'd like browsers to be able to execute the results using jsonp or eval. Does that effect the decision? My knowledge of various browser's javascript support for utf-8 is lacking.

EDIT: I wanted to clarify that my main concern about how to encode the results is really about browser handling of the results. What I've read indicates that browsers may be sensitive to the encoding when using JSONP in particular. I haven't found any really good info on the subject, so I'll have to start doing some testing to see what happens. Ideally I'd like to only escape those few characters that are required and just utf-8 encode the results.

解决方案

The JSON spec requires UTF-8 support by decoders. As a result, all JSON decoders can handle UTF-8 just as well as they can handle the numeric escape sequences. This is also the case for Javascript interpreters, which means JSONP will handle the UTF-8 encoded JSON as well.

The ability for JSON encoders to use the numeric escape sequences instead just offers you more choice. One reason you may choose the numeric escape sequences would be if a transport mechanism in between your encoder and the intended decoder is not binary-safe.

Another reason you may want to use numeric escape sequences is to prevent certain characters appearing in the stream, such as <, & and ", which may be interpreted as HTML sequences if the JSON code is placed without escaping into HTML or a browser wrongly interprets it as HTML. This can be a defence against HTML injection or cross-site scripting (note: some characters MUST be escaped in JSON, including " and \).

Some frameworks, including PHP's implementation of JSON, always do the numeric escape sequences on the encoder side for any character outside of ASCII. This is intended for maximum compatibility with limited transport mechanisms and the like. However, this should not be interpreted as an indication that JSON decoders have a problem with UTF-8.

So, I guess you just could decide which to use like this:

  • Just use UTF-8, unless your method of storage or transport between encoder and decoder is not binary-safe.

  • Otherwise, use the numeric escape sequences.

这篇关于JSON字符编码-浏览器是否很好地支持UTF-8,还是应该使用数字转义序列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆