超过4个十六进制数字的Java Unicode转义 [英] Java unicode escape with more than 4 hexadecimal digits

查看:309
本文介绍了超过4个十六进制数字的Java Unicode转义的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Ant <propertyfile>任务生成的Java属性文件包含匈牙利字母ő的Unicode转义\u000151.

A Java properties file generated by the Ant <propertyfile> task contains the unicode escape \u000151 for the Hungarian letter ő.

我期望\u0151,这是Ant中的错误吗? (Ant 1.8.0,Java 1.7.0)

I expected \u0151, is it a bug in Ant? (Ant 1.8.0, Java 1.7.0)

(基于 JLS 仅4位unicode转义符被认为是有效的...)

(Based on the JLS only a 4-digit unicode escape is considered valid...)

推荐答案

尽管我没有找到与该问题相关的错误报告,但根据Oracle的官方文档,它可能是一个错误:

Although I found no bug reports related to the issue, it is probably a bug, based on an official Oracle documentation: Supplementary Characters in the Java Platform

该文档指出,一个辅助字符可以由两个 unicode转义表示:

This documentation states that one supplementary character can be represented by two unicode escapes:

对于使用的字符编码不能表示字符的情况 字符,Java编程语言提供了Unicode 转义语法.此语法尚未增强以表示 直接使用补充字符.相反,它们由 中两个代码单元的两个连续的Unicode转义 字符的UTF-16表示形式.例如,角色 U + 20000被写为"\ uD840 \ uDC00".

For cases where the character encoding used cannot represent the characters directly, the Java programming language provides a Unicode escape syntax. This syntax has not been enhanced to express supplementary characters directly. Instead, they are represented by the two consecutive Unicode escapes for the two code units in the UTF-16 representation of the character. For example, the character U+20000 is written as "\uD840\uDC00".

该文档还指定了表示文本输入 unicode转义语法的语法(即Java在语言级别不支持):

This document also specifies a syntax for representing a unicode escape syntax for text input (i.e. it is not supported by Java at language level):

对于文本输入,Java 2 SDK提供了一个代码点输入方法 接受格式为"\ Uxxxxxx"的字符串,其中大写字母"U" 表示转义序列包含六个十六进制数字, 因此,允许使用补充字符.小写的"u"表示 转义序列的原始形式"\ uxxxx".

For text input, the Java 2 SDK provides a code point input method which accepts strings of the form "\Uxxxxxx", where the uppercase "U" indicates that the escape sequence contains six hexadecimal digits, thus allowing for supplementary characters. A lowercase "u" indicates the original form of the escape sequences, "\uxxxx".

这意味着Ant <propertyfile>任务不正确,至少应根据上述文档,它应生成\u0151\U000151而不是\u000151(注意大写/小写U).

It means that the Ant <propertyfile> task is incorrect, it should generate \u0151 or \U000151 instead of \u000151 (note the uppercase/lowercase U) - at least based on the documentation above.

但是实际上,似乎不支持\ Uxxxxxx语法:

But in practice the \Uxxxxxx syntax seems to be unsupported:

[test.properties]

[test.properties]

key1=\u0151
key2=\u000151
key3=\U000151

[PropertiesParserTest.java]

[PropertiesParserTest.java]

import static org.junit.Assert.assertEquals;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.Properties;
import org.junit.Test;

public class PropertiesParserTest {
    @Test
    public void testLoad() throws IOException {
        try (InputStream input = getClass().getResourceAsStream("test.properties")) {
            Properties p = new Properties();
            p.load(input);

            // Valid unicode escape
            assertEquals("ő", p.getProperty("key1"));

            // The 6-digit unicode escape generated by Ant is incorrect
            assertEquals("\u0001" + "51", p.getProperty("key2"));

            // \Uxxxxxx is not supported
            assertEquals("U000151", p.getProperty("key3"));
        }
    }

    @Test
    public void testGenerate() throws IOException {
        Properties p1 = new Properties();
        p1.setProperty("key1", "ő");
        p1.setProperty("key2", "\u000151");
        // Not supported in practice: p.setProperty("key3", "\U000151");

        File file = File.createTempFile("PropertiesParserTest_", ".properties");
        System.out.println(file);

        try (OutputStream output = new FileOutputStream(file)) {
            p1.store(output, null);
        }

        try (InputStream input = new FileInputStream(file)) {
            Properties p2 = new Properties();
            p2.load(input);

            // Valid unicode escape
            assertEquals("ő", p2.getProperty("key1"));

            // The 6-digit unicode escape generated by Ant is incorrect
            assertEquals("\u0001" + "51", p2.getProperty("key2"));
        }
    }
}

这篇关于超过4个十六进制数字的Java Unicode转义的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆