Java中控制台应用程序中的Unicode输入 [英] Unicode input in a console application in Java

查看:138
本文介绍了Java中控制台应用程序中的Unicode输入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试在我的Java应用程序中检索unicode用户输入以获取一个小实用程序代码段。问题是,它似乎正在开发Ubuntu开箱即用,我猜想在UTF-8上运行OS宽编码,但在从cmd运行时无法在Windows上运行。考虑的代码如下:

I have been trying to retrieve "unicode user input" in my Java application for a small utility snippet. The problem is, it seems to be working on Ubuntu "out of the box" which has I guess OS wide encoding at UTF-8 but doesn't work on Windows when run from "cmd". The code in consideration is as follows:

public class SerTest {

    public static void main(String[] args) throws Exception {
        testUnicode();
    }

    public static void testUnicode() throws Exception {
        System.out.println("Default charset: " +
           Charset.defaultCharset().name());
        BufferedReader in  =
           new BufferedReader(new InputStreamReader(System.in, "UTF-8"));
        System.out.printf("Enter 'абвгд эюя': ");
        String line = in.readLine();
        String s = "абвгд эюя";
        byte[] sBytes = s.getBytes();
        System.out.println("strg bytes: " + Arrays.toString(sBytes));
        byte[] lineBytes = line.getBytes();
        System.out.println("line bytes: " + Arrays.toString(lineBytes));
        PrintStream out = new PrintStream(System.out, true, "UTF-8");
        out.print("--->" + s + "<----\n");
        out.print("--->" + line + "<----\n");
    }

}

Ubuntu上的输出(没有任何配置更改):

me@host> javac SerTest.java  && java SerTest
Default charset: UTF-8
Enter 'абвгд эюя': абвгд эюя
strg bytes: [-48, -80, -48, -79, -48, -78, -48, -77, -48, -76, 32, -47, -115, -47, -114, -47, -113]
line bytes: [-48, -80, -48, -79, -48, -78, -48, -77, -48, -76, 32, -47, -115, -47, -114, -47, -113]
--->абвгд эюя<----
--->абвгд эюя<----

Windows CMD提示输出(不受JAVA_TOOL_OPTIONS影响):

E:\>chcp 65001
Active code page: 65001

E:\>java -Dfile.encoding=utf8 SerTest
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=utf8
Default charset: UTF-8
Enter 'абвгд эюя': юя': ': абвгд эюя
strg bytes: [-48, -80, -48, -79, -48, -78, -48, -77, -48, -76, 32, -47, -115, -47, -114, -47, -113]
Exception in thread "main" java.lang.NullPointerException
        at SerTest.testUnicode(SerTest.java:26) # byte[] lineBytes = line.getBytes();
        at SerTest.main(SerTest.java:15)

Eclipse中的输出console(使用JAVA_TOOL_OPTIONS后):

Default charset: UTF-8
Enter 'абвгд эюя': абвгд эюя
strg bytes: [-48, -80, -48, -79, -48, -78, -48, -77, -48, -76, 32, -47, -115, -47, -114, -47, -113]
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=utf8
line bytes: [-48, -80, -48, -79, -48, -78, -48, -77, -48, -76, 32, -47, -115, -47, -114, -47, -113]
--->абвгд эюя<----
--->абвгд эюя<----

在Eclipse控制台上,它是工作,因为我添加了一个系统范围的环境变量(JAVA_TOOL_OPTIONS),如果可能的话我想避免。

On Eclipse console, it is working because I have added a system wide environment variable (JAVA_TOOL_OPTIONS) which if possible I would like to avoid.

Eclipse控制台中的输出(在删除后) JAVA_TOOL_OPTIONS):

Output in Eclipse console (after removing JAVA_TOOL_OPTIONS):

Default charset: UTF-8
Enter 'абвгд эюя': абвгд эюя
strg bytes: [-48, -80, -48, -79, -48, -78, -48, -77, -48, -76, 32, -47, -115, -47, -114, -47, -113]
line bytes: [-61, -112, -62, -80, -61, -112, -62, -79, -61, -112, -62, -78, -61, -112, -62, -77, -61, -112, -62, -76, 32, -61, -111, -17, -65, -67, -61, -111, -59, -67, -61, -111, -17, -65, -67]
--->абвгд эюя<----
--->абвгд �ю�<----

所以我的问题是:完全在这里?需要进行哪些代码更改才能确保此代码段适用于各种Unicode输入?

So my question is: what exactly is going on here? What code changes would be required to ensure that this snippet works for all sorts of "Unicode" input?

对于冗长的问题并提前感谢,我们深表歉意,

Sasuke

Sorry for the long winded question and thanks in advance,
Sasuke

推荐答案

一些注释:


  • -Dfile.encoding = utf8 不支持并可能导致意外的副作用:

  • -Dfile.encoding=utf8 is not supported and may cause unintended side-effects:

J2SE平台规范不要求file.encoding属性;它是Sun实现的内部细节,不应由用户代码检查或修改。它也是只读的;在技​​术上不可能支持在命令行或程序执行期间的任何其他时间将此属性设置为任意值。

The "file.encoding" property is not required by the J2SE platform specification; it's an internal detail of Sun's implementations and should not be examined or modified by user code. It's also intended to be read-only; it's technically impossible to support the setting of this property to arbitrary values on the command line or at any other time during program execution.




  • 控制台 类将检测并使用终端编码,但不支持65001 Windows上的(UTF-8) - 至少,它不是我最后一次尝试它

  • 我认为是正确的,记录使用Unicode与cmd.exe的方法是使用 WriteConsoleW ReadConsoleW

    I believe that the correct, documented way to use Unicode with cmd.exe is to use WriteConsoleW and ReadConsoleW.

    我在看这篇文章时写了几篇博文:

    I wrote a couple of blog posts when I was looking at this:

    • I18N: Unicode at the Windows command prompt
    • Java: Unicode on the Windows command line

    这篇关于Java中控制台应用程序中的Unicode输入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆