Java Unicode混淆 [英] Java Unicode Confusion

查看:172
本文介绍了Java Unicode混淆的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

嘿嘿,我刚刚开始尝试学习Java,并且遇到了令人困惑的事情!

HEy all, I have only just started attempting to learn Java and have run into something that is really confusing!

我正在输入书中的一个例子我我正在使用。它是为了演示
char数据类型。

I was typing out an example from the book I am using. It is to demonstrate the char data type.

代码如下:

public class CharDemo
{
public static void main(String [] args)
{
char a = 'A';
char b = (char) (a + 1);
System.out.println(a + b);
System.out.println("a + b is " + a + b);
int x = 75;
char y = (char) x;
char half = '\u00AB';
System.out.println("y is " + y + " and half is " + half);
}
}

令我困惑的是声明,char half ='\ u00AB'。该书指出\ u0000AB是符号'1/2'的代码。如上所述,当我从cmd编译并运行程序时,在该行上生成的符号实际上是'1/2'。

The bit that is confusing me is the statement, char half = '\u00AB'. The book states that \u00AB is the code for the symbol '1/2'. As described, when I compile and run the program from cmd the symbol that is produced on this line is in fact a '1/2'.

所以一切似乎都在起作用。我决定玩代码并尝试一些不同的unicodes。我搜索了多个unicode表,发现它们都没有与上面的结果一致。

So everything appears to be working as it should. I decided to play around with the code and try some different unicodes. I googled multiple unicode tables and found none of them to be consistent with the above result.

在每一个我发现它声明代码/ u00AB不是'1 / 2'实际上是这样的:

In every one I found it stated that the code /u00AB was not for '1/2' and was in fact for this:

http://www.fileformat.info/info/unic...r/ab/index.htm
那么Java使用的是什么字符集,我以为UNIode应该就是那个,Uni,只有一个。我搜索了几个小时,无处可以找到状态/ u00AB等于1/2的字符集,但这是我的java编译器将其解释为。

http://www.fileformat.info/info/unic...r/ab/index.htm So what character set is Java using, I thought UNicode was supposed to be just that, Uni, only one. I have searched for hours and nowhere can I find a character set that states /u00AB is equal to a 1/2, yet this is what my java compiler interprets it as.

我必须在这里遗漏一些明显的东西!感谢您的帮助!

I must be missing something obvious here! Thanks for any help!

推荐答案

这是Windows平台上控制台编码不匹配的一个众所周知的问题。

It's a well-known problem with console encoding mismatch on Windows platforms.

Java Runtime期望系统控制台使用的编码与系统默认编码相同。但是,Windows使用两种单独的编码: ANSI代码页(系统默认编码)和OEM代码页(控制台编码)

Java Runtime expects that encoding used by the system console is the same as the system default encoding. However, Windows uses two separate encodings: ANSI code page (system default encoding) and OEM code page (console encoding).

因此,当您尝试将Unicode字符 U + 00AB LEFT-POINTING DOUBLE ANGLE QUOTATION MARK 写入控制台时,Java运行时期望控制台编码是ANSI编码(在你的情况下是 Windows-1252 ),这里Unicode字符表示为 0xAB 。但是,实际的控制台编码是OEM编码(在您的情况下为 CP437 ),其中 0xAB 表示½

So, when you try to write Unicode character U+00AB LEFT-POINTING DOUBLE ANGLE QUOTATION MARK to the console, Java runtime expects that console encoding is the ANSI encoding (that is Windows-1252 in your case), where this Unicode character is represented as 0xAB. However, the actual console encoding is the OEM encoding (CP437 in your case), where 0xAB means ½.

因此,使用<$ c将数据打印到Windows控制台$ c> System.out.println()产生错误的结果。

Therefore printing data to Windows console with System.out.println() produces wrong results.

要获得正确的结果,您可以使用 System.console()。writer()。println()代替。

To get correct results you can use System.console().writer().println() instead.

这篇关于Java Unicode混淆的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆