Java中的Unicode转义语法 [英] Unicode escape syntax in Java

查看:1257
本文介绍了Java中的Unicode转义语法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Java中,我了解到以下语法可用于提及键盘上没有的Unicode字符(例如非ASCII字符):

In Java, I learned that the following syntax can be used for mentioning Unicode characters that are not on the keyboard (eg. non-ASCII characters):

(\u)(u)*(HexDigit)(HexDigit)(HexDigit)(HexDigit)

我的问题是:
上述语法中(u)*的用途是什么?

My question is: What is the purpose of (u)* in the above syntax?

一个用例我是理解哪个代表Java中的日元符号是:

One use case that I understood which represents Yen symbol in Java is:

char ch = '\u00A5';


推荐答案

有趣的问题。 JSL第3.3节说明

UnicodeEscape:
    \ UnicodeMarker HexDigit HexDigit HexDigit HexDigit

UnicodeMarker:
    u
    UnicodeMarker u

转换为 \\\\ + \ p {XDigit} {4}


如果符合条件的\后跟u或者多个u,并且最后一个u后面没有四个十六进制数字,则会发生编译时错误。

If an eligible \ is followed by u, or more than one u, and the last u is not followed by four hexadecimal digits, then a compile-time error occurs.

所以你说得对,反斜杠后可以有一个或多个 u 。原因进一步说明了:

So you're right, there can be one or more u after the backslash. The reason is given further down:


Java编程语言指定了一种将用Unicode编写的程序转换为ASCII的标准方法,用于更改程序到一个可以由基于ASCII的工具处理的表单。转换涉及通过添加额外的u将程序源文本中的任何Unicode转义转换为ASCII - 例如,\uxxxx变为\ uuxxxx - 同时将源文本中的非ASCII字符转换为包含单个的Unicode转义u each。

The Java programming language specifies a standard way of transforming a program written in Unicode into ASCII that changes a program into a form that can be processed by ASCII-based tools. The transformation involves converting any Unicode escapes in the source text of the program to ASCII by adding an extra u - for example, \uxxxx becomes \uuxxxx - while simultaneously converting non-ASCII characters in the source text to Unicode escapes containing a single u each.

这个转换版本同样可以被Java编译器接受并代表完全相同的程序。稍后可以通过将存在多个u的每个转义序列转换为一个较少u的Unicode字符序列,同时将每个转义序列用单个u转换为相应的单个Unicode字符,从此ASCII格式恢复确切的Unicode源。

This transformed version is equally acceptable to a Java compiler and represents the exact same program. The exact Unicode source can later be restored from this ASCII form by converting each escape sequence where multiple u's are present to a sequence of Unicode characters with one fewer u, while simultaneously converting each escape sequence with a single u to the corresponding single Unicode character.

所以此输入

 \u0020ä

变为

 \uu0020\u00e4

第一个 uu 在这里表示这是一个以开头的unicode转义序列,而第二个 u 表示自动工具将非ASCII字符转换为一个unicode转义。

The first uu means here "this was a unicode escape sequence to begin with" while the second u says "An automatic tool converted a non-ASCII character to a unicode escape."

当您想要从ASCII转换回unicode时,此信息非常有用:您可以尽可能多地恢复原始代码。

This information is useful when you want to convert back from ASCII to unicode: You can restore as much of the original code as possible.

这篇关于Java中的Unicode转义语法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆