java控制台输出的默认字符编码 [英] Default character encoding for java console output

查看:355
本文介绍了java控制台输出的默认字符编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Java如何确定 System.out

使用的编码给定以下类:

  import java.io.File; 
import java.io.PrintWriter;

public class Foo
{
public static void main(String [] args)throws Exception
{
String s =xxäñxx;
System.out.println(s);
PrintWriter out = new PrintWriter(new File(test.txt),UTF-8);
out.println(s);
out.close();
}
}

它保存为UTF-8, code> javac -encoding在Windows系统上的



在一个git-bash控制台上使用UTF-8 Foo.java UTF-8字符集)我这样做:

  $ java Foo 
xxõ±xx
$ java -Dfile .encoding = UTF-8 Foo
xxäñxx
$ cat test.txt
xxäñxx
$ java Foo | cat
xxäñxx
$ java -Dfile.encoding = UTF-8 Foo | cat
xxäñxx

这里发生了什么?



显然,java会检查它是否连接到终端,并在这种情况下改变其编码。有没有办法强制Java只输出纯UTF-8?






太。重定向STDOUT似乎没有什么区别。没有file.encoding参数,它输出ansi编码与它输出utf8编码的参数。

解决方案

仍然在cmd.exe下运行。我怀疑你的控制台真的期待UTF-8 - 我希望它是一个真正的OEM DOS编码(例如 850或437 。)



Java将使用默认编码



PC:

  java Foo 

Java编码为windows-1252;控制台解码为IBM850。结果: Mojibake

  java -Dfile.encoding = UTF-8 Foo 

Java编码为UTF-8;控制台解码为IBM850。结果: Mojibake

  cat test.txt 

cat将文件解码为UTF-8; cat编码为IBM850;控制台解码为IBM850。

  cat 

Java编码为windows-1252; cat解码为windows-1252; cat编码为IBM850;控制台解码为IBM850

  java -Dfile.encoding = UTF- cat 

Java编码为UTF-8; cat解码为UTF-8; cat编码为IBM850;控制台解码为IBM850



此实施 cat 必须使用启发式来确定字符数据是否为UTF-8,



这可以通过以下命令确认:



  $ java HexDump utf8.txt 
78 78 c3 a4 c3 b1 78 78

$ cat utf8。 txt
xxänyxx

$ java HexDump ansi.txt
78 78 e4 f1 78 78

$ cat ansi.txt
xxäñxx

cat 命令可以确定,因为 e4 f1



您可以通过以下方式更正Java输出:





HexDump 是一个简单的Java应用程序:

  import java.io. * ; 
class HexDump {
public static void main(String [] args)throws IOException {
try(InputStream in = new FileInputStream(args [0])){
int r;
while((r = in.read())!= -1){
System.out.format(%02x,0xFF& r);
}
System.out.println();
}
}
}


How does Java determine the encoding used for System.out?

Given the following class:

import java.io.File;
import java.io.PrintWriter;

public class Foo
{
    public static void main(String[] args) throws Exception
    {
        String s = "xxäñxx";
        System.out.println(s);
        PrintWriter out = new PrintWriter(new File("test.txt"), "UTF-8");
        out.println(s);
        out.close();
    }
}

It is saved as UTF-8 and compiled with javac -encoding UTF-8 Foo.java on a Windows system.

Afterwards on a git-bash console (using UTF-8 charset) I do:

$ java Foo
xxõ±xx
$ java -Dfile.encoding=UTF-8 Foo
xxäñxx
$ cat test.txt
xxäñxx
$ java Foo | cat
xxäñxx
$ java -Dfile.encoding=UTF-8 Foo | cat
xxäñxx

What is going on here?

Obviously java checks if it is connected to a terminal and is changing its encoding in that case. Is there a way to force Java to simply output plain UTF-8?


I tried the same with the cmd console, too. Redirecting STDOUT does not seem to make any difference there. Without the file.encoding parameter it outputs ansi encoding with the parameter it outputs utf8 encoding.

解决方案

I'm assuming that your console still runs under cmd.exe. I doubt your console is really expecting UTF-8 - I expect it is really an OEM DOS encoding (e.g. 850 or 437.)

Java will encode bytes using the default encoding set during JVM initialization.

Reproducing on my PC:

java Foo

Java encodes as windows-1252; console decodes as IBM850. Result: Mojibake

java -Dfile.encoding=UTF-8 Foo

Java encodes as UTF-8; console decodes as IBM850. Result: Mojibake

cat test.txt

cat decodes file as UTF-8; cat encodes as IBM850; console decodes as IBM850.

java Foo | cat

Java encodes as windows-1252; cat decodes as windows-1252; cat encodes as IBM850; console decodes as IBM850

java -Dfile.encoding=UTF-8 Foo | cat

Java encodes as UTF-8; cat decodes as UTF-8; cat encodes as IBM850; console decodes as IBM850

This implementation of cat must use heuristics to determine if the character data is UTF-8 or not, then transcodes the data from either UTF-8 or ANSI (e.g. windows-1252) to the console encoding (e.g. IBM850.)

This can be confirmed with the following commands:

$ java HexDump utf8.txt
78 78 c3 a4 c3 b1 78 78

$ cat utf8.txt
xxäñxx

$ java HexDump ansi.txt
78 78 e4 f1 78 78

$ cat ansi.txt
xxäñxx

The cat command can make this determination because e4 f1 is not a valid UTF-8 sequence.

You can correct the Java output by:

HexDump is a trivial Java application:

import java.io.*;
class HexDump {
  public static void main(String[] args) throws IOException {
    try (InputStream in = new FileInputStream(args[0])) {
      int r;
      while((r = in.read()) != -1) {
        System.out.format("%02x ", 0xFF & r);
      }
      System.out.println();
    }
  }
}

这篇关于java控制台输出的默认字符编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆