Java FileReader编码问题 [英] Java FileReader encoding issue

查看:144
本文介绍了Java FileReader编码问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图使用java.io.FileReader来读取一些文本文件并将其转换成字符串,但是我发现结果是编码错误的,并且根本不可读。

I tried to use java.io.FileReader to read some text files and convert them into a string, but I found the result is wrong encoded and not readable at all.

这是我的环境:


  • Windows 2003,操作系统编码:CP1252

  • Windows 2003, OS encoding: CP1252

Java 5.0

我的文件是UTF-8编码或CP1252编码的,他们(UTF-8编码的文件)可能包含中文(非拉丁字符)。

My files are UTF-8 encoded or CP1252 encoded, and some of them (UTF-8 encoded files) may contain Chinese (non-Latin) characters.

我使用以下代码来完成我的工作:

I use the following code to do my work:

   private static String readFileAsString(String filePath)
    throws java.io.IOException{
        StringBuffer fileData = new StringBuffer(1000);
        FileReader reader = new FileReader(filePath);
        //System.out.println(reader.getEncoding());
        BufferedReader reader = new BufferedReader(reader);
        char[] buf = new char[1024];
        int numRead=0;
        while((numRead=reader.read(buf)) != -1){
            String readData = String.valueOf(buf, 0, numRead);
            fileData.append(readData);
            buf = new char[1024];
        }
        reader.close();
        return fileData.toString();
    }

上述代码无效。我发现FileReader的编码是CP1252,即使文本是UTF-8编码的。但是java.io.FileReader的JavaDoc表示:

The above code doesn't work. I found the FileReader's encoding is CP1252 even if the text is UTF-8 encoded. But the JavaDoc of java.io.FileReader says that:


该类的构造函数假定
,默认字符编码
和默认字节缓冲区大小是
适用。

The constructors of this class assume that the default character encoding and the default byte-buffer size are appropriate.

这是否意味着我不需要设置如果我使用FileReader,自己编写字符?但是目前我的编码数据错误,处理我的机构的正确方法是什么?谢谢。

Does this mean that I am not required to set character encoding by myself if I am using FileReader? But I did get wrong encoded data currently, what's the correct way to deal with my situtaion? Thanks.

推荐答案

是的,您需要指定要读取的文件的编码。

Yes, you need to specify the encoding of the file you want to read.

是的,这意味着您必须知道您想要阅读的文件的编码。

Yes, this means that you have to know the encoding of the file you want to read.

不,没有一般的方法来猜测任何给定的纯文本文件的编码。

No, there is no general way to guess the encoding of any given "plain text" file.

FileReader 总是使用通常是一个坏主意的平台默认编码。

而不是FileReader,您需要使用 new InputStreamReader( new FileInputStream (pathToFile) ,< encoding>)

Instead of FileReader you need to use new InputStreamReader(new FileInputStream(pathToFile), <encoding>).

这篇关于Java FileReader编码问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆