Java Unicode混淆 [英] Java Unicode Confusion
问题描述
嘿嘿,我刚刚开始尝试学习Java,并且遇到了令人困惑的事情!
HEy all, I have only just started attempting to learn Java and have run into something that is really confusing!
我正在输入书中的一个例子我我正在使用。它是为了演示
char数据类型。
I was typing out an example from the book I am using. It is to demonstrate the char data type.
代码如下:
public class CharDemo
{
public static void main(String [] args)
{
char a = 'A';
char b = (char) (a + 1);
System.out.println(a + b);
System.out.println("a + b is " + a + b);
int x = 75;
char y = (char) x;
char half = '\u00AB';
System.out.println("y is " + y + " and half is " + half);
}
}
令我困惑的是声明,char half ='\ u00AB'。该书指出\ u0000AB是符号'1/2'的代码。如上所述,当我从cmd编译并运行程序时,在该行上生成的符号实际上是'1/2'。
The bit that is confusing me is the statement, char half = '\u00AB'. The book states that \u00AB is the code for the symbol '1/2'. As described, when I compile and run the program from cmd the symbol that is produced on this line is in fact a '1/2'.
所以一切似乎都在起作用。我决定玩代码并尝试一些不同的unicodes。我搜索了多个unicode表,发现它们都没有与上面的结果一致。
So everything appears to be working as it should. I decided to play around with the code and try some different unicodes. I googled multiple unicode tables and found none of them to be consistent with the above result.
在每一个我发现它声明代码/ u00AB不是'1 / 2'实际上是这样的:
In every one I found it stated that the code /u00AB was not for '1/2' and was in fact for this:
http://www.fileformat.info/info/unic...r/ab/index.htm
那么Java使用的是什么字符集,我以为UNIode应该就是那个,Uni,只有一个。我搜索了几个小时,无处可以找到状态/ u00AB等于1/2的字符集,但这是我的java编译器将其解释为。
http://www.fileformat.info/info/unic...r/ab/index.htm So what character set is Java using, I thought UNicode was supposed to be just that, Uni, only one. I have searched for hours and nowhere can I find a character set that states /u00AB is equal to a 1/2, yet this is what my java compiler interprets it as.
我必须在这里遗漏一些明显的东西!感谢您的帮助!
I must be missing something obvious here! Thanks for any help!
推荐答案
这是Windows平台上控制台编码不匹配的一个众所周知的问题。
It's a well-known problem with console encoding mismatch on Windows platforms.
Java Runtime期望系统控制台使用的编码与系统默认编码相同。但是,Windows使用两种单独的编码: ANSI代码页(系统默认编码)和OEM代码页(控制台编码)。
Java Runtime expects that encoding used by the system console is the same as the system default encoding. However, Windows uses two separate encodings: ANSI code page (system default encoding) and OEM code page (console encoding).
因此,当您尝试将Unicode字符 U + 00AB LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
写入控制台时,Java运行时期望控制台编码是ANSI编码(在你的情况下是 Windows-1252 ),这里Unicode字符表示为 0xAB
。但是,实际的控制台编码是OEM编码(在您的情况下为 CP437 ),其中 0xAB
表示½
。
So, when you try to write Unicode character U+00AB LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
to the console, Java runtime expects that console encoding is the ANSI encoding (that is Windows-1252 in your case), where this Unicode character is represented as 0xAB
. However, the actual console encoding is the OEM encoding (CP437 in your case), where 0xAB
means ½
.
因此,使用<$ c将数据打印到Windows控制台$ c> System.out.println()产生错误的结果。
Therefore printing data to Windows console with System.out.println()
produces wrong results.
要获得正确的结果,您可以使用 System.console()。writer()。println()
代替。
To get correct results you can use System.console().writer().println()
instead.
这篇关于Java Unicode混淆的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!