JSONObject在org.json中的lib:utf-8编码问题 [英] JSONObject in org.json lib: utf-8 encoding issue

查看:256
本文介绍了JSONObject在org.json中的lib:utf-8编码问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在关注 Unicode - How获得角色正确吗?



我唯一的问题是使用JSONObject编码(我使用 org.json lib)。



例如,当我在JSONObject中放置一个像àòùè쀀这样的字符串时会出现此问题。

  System.out.println(entry.getValue()); 
JSONObject temp = new JSONObject();
temp.put(values,entry.getValue();
System.out.println(temp.toString());

我获得àòùè쀀 {values:àòùèì\\\€\\ \ code>而不是 {values:àòùè쀀}



编辑



通过从散列表传递到jsonObject,使用扩展的utf-8编码,例如,hashtable

  {€èòàùì€ù=èòàù€ì,€òàèùì€=èòàù€ìç§$} 

成为JSONObject

  { \\\€òàèùì\\\€:èòàù\\\€ìç§$,\\\€èòàùì\\\€ù:èòàù\\\€ì} 


解决方案

它们完全相同,Unicode转义占用了更多的空间,像写 \\\J 在Java中与写入 a 完全相同,如果相关ctness是你的关心,没关系。



除了大多数文本在0x2000到0x20FF之间,它不会占用大量的额外空间: / p>

以下代码转义C0和C1控制字符,但它也转义为0x2000 - 0x20FF:

  if(c < ''|| (c> ='\\\€'&& c<'\\\ )
|| (c> ='\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ b

所以0x2000 - 0x20FF和控制字符之间的任何字符都表示为unicode转义。这对于控制字符是有意义的,因为它们不允许以JSON格式转义。



至于0x2000 - 0x20FF,我不知道因为代码没有注释。该范围内的每个角色均未转义为有效的JSON。 c $ c >和 0x2029 在Javascript 中无效(所以这个小细节使JSON语法不是Javascript语法的一部分),所以最好在JSON中转义它以防它被用作JSONP,这是Javascript。但是,为什么代码逃避整个范围并不明显,因为该范围内只有2个字符是非法的。


I'm following the Unicode - How to get the characters right? post.

The only issue I have is with JSONObject encoding (I'm using org.json lib).

The issue arises when I put a string like àòùè쀀, for example, in a JSONObject.

System.out.println(entry.getValue());
JSONObject temp = new JSONObject();
temp.put("values", entry.getValue();
System.out.println(temp.toString());

I obtain àòùè쀀 and {"values":"àòùèì\u20ac\u20ac"} instead of {"values":"àòùè쀀"}.

EDIT

By passing from an hashtable to a jsonObject, the extended utf-8 encoding is used. For example, the hashtable

 {€èòàùì€ù=èòàù€ì, €òàèùì€=èòàù€ìç§$}

becomes the JSONObject

 {"\u20acòàèùì\u20ac":"èòàù\u20acìç§$","\u20acèòàùì\u20acù":"èòàù\u20acì"}

解决方案

They are exactly equal, with the Unicode escaping taking a bit more space. Like writing \u004a in Java is exactly the same as writing a. If correctness is your concern, it doesn't matter.

And it won't take considerable amount of extra space either unless most of your text is between 0x2000 - 0x20FF:

The following code escapes C0 and C1 control characters, but it also escapes 0x2000 - 0x20FF:

     if (c < ' ' || (c >= '\u0080' && c < '\u00a0')
                    || (c >= '\u2000' && c < '\u2100')) {

So any character between 0x2000 - 0x20FF and control characters are represented as unicode escapes. This makes sense for control characters because those are not allowed in JSON in their unescaped form.

As for 0x2000 - 0x20FF, I have no idea because the code is not commented. Every character unescaped in that range is valid JSON. Of course, 0x2028 and 0x2029 are not valid in Javascript (so this small detail makes JSON syntax not a subset of Javascript syntax), so it's good idea to escape those in JSON in case it is being used as JSONP which is Javascript really. But it is not apparent to me why the code escapes a whole range because just 2 characters in the range are illegal.

这篇关于JSONObject在org.json中的lib:utf-8编码问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆