超过 4 个十六进制数字的 Java Unicode 转义 [英] Java unicode escape with more than 4 hexadecimal digits

查看:53
本文介绍了超过 4 个十六进制数字的 Java Unicode 转义的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Ant 任务生成的 Java 属性文件包含匈牙利字母 ő 的 unicode 转义符 \u000151.

A Java properties file generated by the Ant <propertyfile> task contains the unicode escape \u000151 for the Hungarian letter ő.

我期望 \u0151,这是 Ant 中的错误吗?(蚂蚁 1.8.0,Java 1.7.0)

I expected \u0151, is it a bug in Ant? (Ant 1.8.0, Java 1.7.0)

(基于 JLS 只有 4 位 unicode 转义被认为是有效的...)

(Based on the JLS only a 4-digit unicode escape is considered valid...)

推荐答案

虽然我没有发现与该问题相关的错误报告,但它可能是一个错误,基于 Oracle 官方文档:Java 平台中的补充字符

Although I found no bug reports related to the issue, it is probably a bug, based on an official Oracle documentation: Supplementary Characters in the Java Platform

本文档说明一个补充字符可以用两个unicode转义表示:

This documentation states that one supplementary character can be represented by two unicode escapes:

对于使用的字符编码不能代表字符,Java 编程语言提供了一个 Unicode转义语法.此语法尚未增强以表达直接补充字符.相反,它们由中两个代码单元的两个连续 Unicode 转义字符的 UTF-16 表示.例如,字符U+20000 写成\uD840\uDC00".

For cases where the character encoding used cannot represent the characters directly, the Java programming language provides a Unicode escape syntax. This syntax has not been enhanced to express supplementary characters directly. Instead, they are represented by the two consecutive Unicode escapes for the two code units in the UTF-16 representation of the character. For example, the character U+20000 is written as "\uD840\uDC00".

本文档还指定了一种用于表示文本输入unicode转义语法的语法(即Java在语言级别不支持它):

This document also specifies a syntax for representing a unicode escape syntax for text input (i.e. it is not supported by Java at language level):

对于文本输入,Java 2 SDK 提供了代码点输入法它接受形式为\Uxxxxxx"的字符串,其中大写的U"表示转义序列包含六个十六进制数字,从而允许补充字符.小写的u"表示转义序列的原始形式,\uxxxx".

For text input, the Java 2 SDK provides a code point input method which accepts strings of the form "\Uxxxxxx", where the uppercase "U" indicates that the escape sequence contains six hexadecimal digits, thus allowing for supplementary characters. A lowercase "u" indicates the original form of the escape sequences, "\uxxxx".

表示 Ant 任务不正确,应该生成 \u0151\U000151 而不是 \u000151(注意大写/小写 U) - 至少基于上述文档.

It means that the Ant <propertyfile> task is incorrect, it should generate \u0151 or \U000151 instead of \u000151 (note the uppercase/lowercase U) - at least based on the documentation above.

但实际上 \Uxxxxxx 语法似乎不受支持:

But in practice the \Uxxxxxx syntax seems to be unsupported:

[test.properties]

[test.properties]

key1=\u0151
key2=\u000151
key3=\U000151

[PropertiesParserTest.java]

[PropertiesParserTest.java]

import static org.junit.Assert.assertEquals;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.Properties;
import org.junit.Test;

public class PropertiesParserTest {
    @Test
    public void testLoad() throws IOException {
        try (InputStream input = getClass().getResourceAsStream("test.properties")) {
            Properties p = new Properties();
            p.load(input);

            // Valid unicode escape
            assertEquals("ő", p.getProperty("key1"));

            // The 6-digit unicode escape generated by Ant is incorrect
            assertEquals("\u0001" + "51", p.getProperty("key2"));

            // \Uxxxxxx is not supported
            assertEquals("U000151", p.getProperty("key3"));
        }
    }

    @Test
    public void testGenerate() throws IOException {
        Properties p1 = new Properties();
        p1.setProperty("key1", "ő");
        p1.setProperty("key2", "\u000151");
        // Not supported in practice: p.setProperty("key3", "\U000151");

        File file = File.createTempFile("PropertiesParserTest_", ".properties");
        System.out.println(file);

        try (OutputStream output = new FileOutputStream(file)) {
            p1.store(output, null);
        }

        try (InputStream input = new FileInputStream(file)) {
            Properties p2 = new Properties();
            p2.load(input);

            // Valid unicode escape
            assertEquals("ő", p2.getProperty("key1"));

            // The 6-digit unicode escape generated by Ant is incorrect
            assertEquals("\u0001" + "51", p2.getProperty("key2"));
        }
    }
}

这篇关于超过 4 个十六进制数字的 Java Unicode 转义的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆