Gson Unicode字符转换为Unicode字符代码 [英] Gson Unicode characters conversion to Unicode character codes

查看:174
本文介绍了Gson Unicode字符转换为Unicode字符代码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在下面查看我的代码.我有一个包含Unicode字符代码的JSON字符串.我将其转换为Java对象,然后将其转换回JSON字符串.但是,您可以看到输入和输出JSON字符串不匹配.是否可以使用Gson将我的对象转换为原始JSON字符串?我希望outputJsoninputJson相同.

Check out my code below. I have a JSON string which contains Unicode character codes. I convert it to my Java object and then convert it back to JSON string. However, you can see that input and output JSON strings don't match. Is it possible to convert my object to original JSON string using Gson? I want outputJson to be the same as inputJson.

static class Book {
    String description;
}

public static void test() {
    Gson gson = new Gson();

    String inputJson = "{\"description\":\"Tikrovi\\u0161kai para\\u0161ytas k\\u016brinys\"}";
    Book book = gson.fromJson(inputJson, Book.class);
    String outputJson = gson.toJson(book);

    System.out.println(inputJson);
    System.out.println(outputJson);
    // Prints:
    // {"description":"Tikrovi\u0161kai para\u0161ytas k\u016brinys"}
    // {"description":"Tikroviškai parašytas kūrinys"}
}

推荐答案

不幸的是,Gson似乎不支持它.所有JSON输入/输出分别集中在Gson(从2.8.0版开始)JsonReaderJsonWriter中. JsonReader可以使用其专用的readEscapeCharacter方法读取Unicode转义.但是,与JsonReader不同,JsonWriter只是将字符串写入后备Writer实例,不对127以上的字符进行字符校正,除了 

\u2029 .唯一可能的情况是,您可以在此处编写自定义的转义Writer,以便发出Unicode转义.

Unfortunately, Gson does not seem to support it. All JSON input/output is concentrated in Gson (as of 2.8.0) JsonReader and JsonWriter respectively. JsonReader can read Unicode escapes using its private readEscapeCharacter method. However, unlike JsonReader, JsonWriter simply writes a string to the backing Writer instance making no character corrections for characters above 127 except \u2028 and 

\u2029. The only thing, probably, you can do here is writing a custom escaping Writer so that you could emit Unicode escapes.

final class EscapedWriter
        extends Writer {

    private static final char[] hex = {
            '0', '1', '2', '3',
            '4', '5', '6', '7',
            '8', '9', 'a', 'b',
            'c', 'd', 'e', 'f'
    };

    private final Writer writer;

    // I/O components are usually implemented in not thread-safe manner
    // so we can save some time on constructing a single UTF-16 escape
    private final char[] escape = { '\\', 'u', 0, 0, 0, 0 };

    EscapedWriter(final Writer writer) {
        this.writer = writer;
    }

    // This implementation is not very efficient and is open for enhancements:
    // * constructing a single "normalized" buffer character array so that it could be passed to the downstream writer
    //   rather than writing characters one by one
    // * etc...
    @Override
    public void write(final char[] buffer, final int offset, final int length)
            throws IOException {
        for ( int i = offset; i < length; i++ ) {
            final int ch = buffer[i];
            if ( ch < 128 ) {
                writer.write(ch);
            } else {
                escape[2] = hex[(ch & 0xF000) >> 12];
                escape[3] = hex[(ch & 0x0F00) >> 8];
                escape[4] = hex[(ch & 0x00F0) >> 4];
                escape[5] = hex[ch & 0x000F];
                writer.write(escape);
            }
        }
    }

    @Override
    public void flush()
            throws IOException {
        writer.flush();
    }

    @Override
    public void close()
            throws IOException {
        writer.close();
    }

    // Some java.io.Writer subclasses may use java.lang.Object.toString() to materialize their accumulated state by design
    // so it has to be overridden and forwarded as well
    @Override
    public String toString() {
        return writer.toString();
    }

}

此作者未经过充分测试,因此不尊重\u2028\u2029.然后只需在调用toJson方法时配置输出目标即可:

This writer is NOT well-tested, and does not respect \u2028 and \u2029. And then just configure the output destination when invoking the toJson method:

final String input = "{\"description\":\"Tikrovi\\u0161kai para\\u0161ytas k\\u016brinys\"}";
final Book book = gson.fromJson(input, Book.class);
final Writer output = new EscapedWriter(new StringWriter());
gson.toJson(book, output);
System.out.println(input);
System.out.println(output);

输出:

{说明":"Tikrovi \ u016​​1kai para \ u016​​1ytas k \ u016​​brinys"}
{说明":"Tikrovi \ u016​​1kai para \ u016​​1ytas k \ u016​​brinys"}

{"description":"Tikrovi\u0161kai para\u0161ytas k\u016brinys"}
{"description":"Tikrovi\u0161kai para\u0161ytas k\u016brinys"}

这是一个有趣的问题,您可能还会在 google/gson 上提出问题添加字符串编写配置选项-或至少从开发团队那里获得一些评论.我确实相信他们非常了解这种行为,并通过设计使它像这样工作,但是他们也可以对此有所了解(我现在唯一想到的是,他们现在有更多的性能而没有额外增加性能).写入字符串之前进行转换,但这是一个微不足道的猜测).

It's an interesting problem, and you might also raise an issue on google/gson to add a string writing configuration option - or at least to get some comments from the development team. I do believe they are very aware of such a behavior and made it work like that by design, however they could also shed some light on it (the only one I could think of now is that currently they have some more performance not making an additional transformation before writing a string, but it's a weak guess though).

这篇关于Gson Unicode字符转换为Unicode字符代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆