Unicode字符在Java JSON解析中显示为问号 [英] Unicode Characters appearing as Question Marks in Java JSON Parsing

查看:373
本文介绍了Unicode字符在Java JSON解析中显示为问号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

过去几天我一直在搜索这个,但我认为我找不到正确的指针。如果发现重复,请将其与相应的问题合并。

I have been searching about this for the past few days but I don't think I am able to find a correct pointer. Please merge it with the appropriate question if found as duplicate.

我很擅长使用JSON,作为我的一个项目的一部分,我需要解码JSON文件并对其进行进一步处理。但是,当我尝试使用Json简单库进行解码时,我在解析的对象中得到了一些奇怪的问号而不是实际的字符。示例代码如下所示:

I am pretty new to working with JSON and as part of one of my projects I need to decode a JSON file and do further processing on it. However when I tried decoding using the Json-simple library, I get some weird question marks in the parsed object instead of the actual characters. A sample code is shown below:

String str = "{\"alias\": [\"Evr\u00f3pa\", \"\u05d0\u05d9\u05e8\u05d5\u05e4\"]}";
JSONParser parser = new JSONParser(); 
JSONObject jsonObject = (JSONObject)parser.parse(str);

System.out.println(jsonObject) gives {"alias":["Evrópa","?????"]}

我也尝试使用Json-lib,结果相同。

I tried using Json-lib too with the same result.

感谢您的帮助。

推荐答案

问题不在于您的JSON,而在于您的System.out.println()。这些字符无法在您的终端(或您的IDE,如果您运行它的位置)或您环境中System.out使用的编码的字符编码中表示。

The problem isn't with your JSON, it's with your System.out.println(). Those characters can't be represented in the character encoding either of your terminal (or your IDE, if that is where you ran it) or of the encoding being used by System.out in your environment.

文件不能包含Unicode字符。文件是 bytes 的流,但Unicode 字符的大小是多个字节(通常是两个)。这是字符编码变得相关的地方。必须将Unicode字符转换为字节序列才能将它们写入文件(包括System.out)。 Unicode字符最常用的编码之一是UTF-8。软件程序员的诀窍是在字节和字符之间进行转换时始终使用正确的字符编码。在一个地方缺少正确的编码,例如在调试println()调用中,会产生错误和误导性的输出。

Files can not contain Unicode characters. Files are streams of bytes, but Unicode characters are multiple bytes (usually two) in size. This is where character encodings become relevant. Unicode characters must be converted to a sequence of bytes to write them to a file (including System.out). One of the most commonly used encodings for Unicode characters is UTF-8. The trick for software programmers is to always use the correct character encoding when converting between bytes and characters. Lacking the correct encoding in a single place, for example in a debug println() call, will give erroneous and misleading output.

这篇关于Unicode字符在Java JSON解析中显示为问号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆