Gson Unicode字符转换为Unicode字符代码 [英] Gson Unicode characters conversion to Unicode character codes
问题描述
在下面查看我的代码.我有一个包含Unicode字符代码的JSON字符串.我将其转换为Java对象,然后将其转换回JSON字符串.但是,您可以看到输入和输出JSON字符串不匹配.是否可以使用Gson将我的对象转换为原始JSON字符串?我希望outputJson
与inputJson
相同.
Check out my code below. I have a JSON string which contains Unicode character codes. I convert it to my Java object and then convert it back to JSON string. However, you can see that input and output JSON strings don't match. Is it possible to convert my object to original JSON string using Gson? I want outputJson
to be the same as inputJson
.
static class Book {
String description;
}
public static void test() {
Gson gson = new Gson();
String inputJson = "{\"description\":\"Tikrovi\\u0161kai para\\u0161ytas k\\u016brinys\"}";
Book book = gson.fromJson(inputJson, Book.class);
String outputJson = gson.toJson(book);
System.out.println(inputJson);
System.out.println(outputJson);
// Prints:
// {"description":"Tikrovi\u0161kai para\u0161ytas k\u016brinys"}
// {"description":"Tikroviškai parašytas kūrinys"}
}
推荐答案
不幸的是,Gson似乎不支持它.所有JSON输入/输出分别集中在Gson(从2.8.0版开始)JsonReader
和JsonWriter
中. JsonReader
可以使用其专用的readEscapeCharacter
方法读取Unicode转义.但是,与JsonReader
不同,JsonWriter
只是将字符串写入后备Writer
实例,不对127以上的字符进行字符校正,除了
\u2029
.唯一可能的情况是,您可以在此处编写自定义的转义Writer
,以便发出Unicode转义.
Unfortunately, Gson does not seem to support it. All JSON input/output is concentrated in Gson (as of 2.8.0) JsonReader
and JsonWriter
respectively. JsonReader
can read Unicode escapes using its private readEscapeCharacter
method. However, unlike JsonReader
, JsonWriter
simply writes a string to the backing Writer
instance making no character corrections for characters above 127 except \u2028
and
\u2029
. The only thing, probably, you can do here is writing a custom escaping Writer
so that you could emit Unicode escapes.
final class EscapedWriter
extends Writer {
private static final char[] hex = {
'0', '1', '2', '3',
'4', '5', '6', '7',
'8', '9', 'a', 'b',
'c', 'd', 'e', 'f'
};
private final Writer writer;
// I/O components are usually implemented in not thread-safe manner
// so we can save some time on constructing a single UTF-16 escape
private final char[] escape = { '\\', 'u', 0, 0, 0, 0 };
EscapedWriter(final Writer writer) {
this.writer = writer;
}
// This implementation is not very efficient and is open for enhancements:
// * constructing a single "normalized" buffer character array so that it could be passed to the downstream writer
// rather than writing characters one by one
// * etc...
@Override
public void write(final char[] buffer, final int offset, final int length)
throws IOException {
for ( int i = offset; i < length; i++ ) {
final int ch = buffer[i];
if ( ch < 128 ) {
writer.write(ch);
} else {
escape[2] = hex[(ch & 0xF000) >> 12];
escape[3] = hex[(ch & 0x0F00) >> 8];
escape[4] = hex[(ch & 0x00F0) >> 4];
escape[5] = hex[ch & 0x000F];
writer.write(escape);
}
}
}
@Override
public void flush()
throws IOException {
writer.flush();
}
@Override
public void close()
throws IOException {
writer.close();
}
// Some java.io.Writer subclasses may use java.lang.Object.toString() to materialize their accumulated state by design
// so it has to be overridden and forwarded as well
@Override
public String toString() {
return writer.toString();
}
}
此作者未经过充分测试,因此不尊重\u2028
和\u2029
.然后只需在调用toJson
方法时配置输出目标即可:
This writer is NOT well-tested, and does not respect \u2028
and \u2029
. And then just configure the output destination when invoking the toJson
method:
final String input = "{\"description\":\"Tikrovi\\u0161kai para\\u0161ytas k\\u016brinys\"}";
final Book book = gson.fromJson(input, Book.class);
final Writer output = new EscapedWriter(new StringWriter());
gson.toJson(book, output);
System.out.println(input);
System.out.println(output);
输出:
{说明":"Tikrovi \ u0161kai para \ u0161ytas k \ u016brinys"}
{说明":"Tikrovi \ u0161kai para \ u0161ytas k \ u016brinys"}
{"description":"Tikrovi\u0161kai para\u0161ytas k\u016brinys"}
{"description":"Tikrovi\u0161kai para\u0161ytas k\u016brinys"}
这是一个有趣的问题,您可能还会在 google/gson 上提出问题添加字符串编写配置选项-或至少从开发团队那里获得一些评论.我确实相信他们非常了解这种行为,并通过设计使它像这样工作,但是他们也可以对此有所了解(我现在唯一想到的是,他们现在有更多的性能而没有额外增加性能).写入字符串之前进行转换,但这是一个微不足道的猜测).
It's an interesting problem, and you might also raise an issue on google/gson to add a string writing configuration option - or at least to get some comments from the development team. I do believe they are very aware of such a behavior and made it work like that by design, however they could also shed some light on it (the only one I could think of now is that currently they have some more performance not making an additional transformation before writing a string, but it's a weak guess though).
这篇关于Gson Unicode字符转换为Unicode字符代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!