Java Unicode问题(我认为) [英] Java Unicode Problems (I think)

查看:51
本文介绍了Java Unicode问题(我认为)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Java的新手,所以如果我说任何愚蠢的话,请忍受!我遇到了一些问题,我认为这是与Unicode相关的.

I'm new to Java, so bear with me if I say anything stupid! I'm having a few problems, which I think are Unicode-related.

我正在使用Scanner从以UTF-8编码保存的文本文件中读取标记化命令.基本上,我想先检查命令是否等于"command1"或"command2"(在这些情况下,我还要执行其他操作),然后以其他方式读取字符.如果令牌不是单个字符,我将输出一个错误.

I'm using Scanner to read in tokenised commands from a text file, saved with UTF-8 encoding. Basically I want to first check that the command isn't equal to either "command1" or "command2" (I do something else in these cases), then otherwise read in a character. If the token isn't a single character, I'm going to output an error.

这是我的代码:

public static void main(String[] args) throws FileNotFoundException {
    Scanner scanner = new Scanner(new File(args[0]));
    while (scanner.hasNext()) {
        String command = scanner.next();
        if (command.equals("command1")) {
            System.out.println("command: command1");
            // do something
        } else if (command.equals("command2")) {
            System.out.println("command: command2");
            // do something
        } else {
            if (command.length() == 1) {
                char c = command.charAt(0);
                System.out.println("character: " + c);
                // do something with c
            } else {
                System.err.println("error (string was " + command
                        + " with length " + command.length() + ")");
            }
        }
    }
}

以及我要传递其文件名args [0]进行测试的文本文件的内容:

And the contents of the text file whose filename I'm passing in args[0] for testing:

command1
x
y
command2
z
└
command1
╒
═

预期输出为:

command: command1
character: x
character: y
command: command2
character: z
character: └
command: command1
character: ╒
character:  ═

实际输出为:

command: command1
character: x
character: y
command: command2
character: z
error (string was └ with length 3)
command: command1
error (string was ╒ with length 3)
error (string was ═ with length 3)

如您所见,Java将非标准字符视为3个字符的字符串.奇怪的是,如果我将终端输出中的一个字符复制/粘贴到System.out.println("└".length())语句中,它将正确打印1.

As you can see, the non-standard characters are being seen as a 3-character string by Java. Strangely, if I copy/paste the one of the characters from the terminal output into a System.out.println("└".length()) statement, it correctly prints 1.

关于我要去哪里的任何想法吗?
谢谢

Any ideas on where I'm going wrong?
Thanks

推荐答案

在Java中打开文件时,编码(如果未指定)是从file.encoding系统属性中获取的.几乎从来没有设置过您想要的东西(如果您像我一样,总是需要UTF-8).

When you open files in Java, the encoding (if you don't specify one) is taken from the file.encoding system property. This is almost never set to something that you want (if you're like me, you always want UTF-8).

要解决此问题,请在创建扫描仪时明确指定字符集:

To fix, explicitly specify your character set when you create your Scanner:

Scanner scanner = new Scanner(new File(args[0]), "UTF-8");

这篇关于Java Unicode问题(我认为)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆