Java,Unicode,UTF-8和Windows命令提示符 [英] Java, Unicode, UTF-8, and Windows Command Prompt

查看:164
本文介绍了Java,Unicode,UTF-8和Windows命令提示符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个jar文件应该读取UTF-8编码的文件 - 我在Windows下的文本编辑器中写入 - 并将字符显示在屏幕上。在OS X和Linux下,它可以完美运行。我在Windows下工作时遇到了一些麻烦......我已经定义了一个像这样的读者和作家:

I have a jar file that is supposed to read a UTF-8 encoded file—that I wrote in a text editor under Windows—and display the characters to the screen. Under OS X and Linux this works flawlessly. I'm having a bit of trouble getting it to work under Windows though... I've defined a reader and writer like so:

FileInputStream file = new FileInputStream(args[0]);
InputStreamReader reader = new InputStreamReader(file, "UTF8");

PrintStream writer = new PrintStream(System.out, true, "UTF8");

我还将命令提示字体更改为 Lucida Console 并按顺序将字符编码为UTF-8,其中 chcp 65001

I've also changed the command prompt font to Lucida Console and the character encoding to UTF-8 with chcp 65001, in that order.

现在,当我运行 java -jar Read.jar file.txt 时,提示将其消除。

Now, when I run java -jar Read.jar file.txt, the prompt splurges this out.

áéí
ñóú
[]óú
[]

但是,如果我运行键入file.txt ,则提示正确显示文件的内容。

However, if I run type file.txt, the prompt correctly displays the file's contents.

áéí
ñóú

我试过保存我的有和没有BOM的文件,但这没有什么区别。 (UTF-8甚至不需要BOM,因为它缺乏字节序的,对吗?)我已经试过编译的javac -encoding UTF8 *的.java ,但同样的事情发生。

I've tried saving my file with and without BOM, but that hasn't made a difference. (UTF-8 doesn't even need BOM because it's lack of endianness, correct?) I've tried compiling with javac -encoding utf8 *.java, but the same thing happens.

我现在没有想法了。有人愿意提供帮助吗?

I'm out of ideas now. Anyone care to help?

推荐答案

代码页65001已损坏。该MS C运行时stdio函数返回不准确的计数字节读取和写入时65001,这会导致这样一个怪异的行为下运行

Code page 65001 is broken. The MS C runtime stdio functions return inaccurate counts of bytes read and written when run under 65001, which leads to strange behaviours like this one.

这不是可以解决的 - 你可以对于使用C stdlib字节I / O函数(包括Java)的应用程序,可靠地使用Windows控制台进行Unicode I / O.您可以通过调用Win32 API函数WriteConsoleW直接将Unicode内容直接发送到控制台来破解它,但是您必须担心检测stdout实际上何时是控制台(未重定向到文件)。

It's not fixable - you can't reliably use the Windows console for Unicode I/O from applications that use the C stdlib byte-I/O functions (which includes Java). You can hack it by calling the Win32 API function WriteConsoleW to get Unicode content directly to the Console, but then you have to worry about detecting when stdout actually is a console (not redirected to file).

这是一个长期存在的祸患,MS对修复没有兴趣。

This is a long-standing source of woe which MS shows no interest in fixing.

这篇关于Java,Unicode,UTF-8和Windows命令提示符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆