如何读取具有特定字符编码的Java文件? [英] How to read a file in Java with specific character encoding?

查看:406
本文介绍了如何读取具有特定字符编码的Java文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试以UTF-8或Windows-1252格式读取文件,具体取决于此方法的输出:

  public Charset getCorrectCharsetToApply(){
//返回UTF-8或Windows-1252的字符集。
}



到目前为止,我有:

 字符串fileName = getFileNameToReadFromUserInput(); 
InputStream is = new ByteArrayInputStream(fileName.getBytes());
InputStreamReader isr = new InputStreamReader(is,getCorrectCharsetToApply());
BufferedReader buffReader = new BufferedReader(isr);

我遇到的问题是转换 BufferedReader

此外:

ul>
  • 文件本身的名称( fileName )不能被信任为特定的 Charset ;有时文件名将包含UTF-8字符,有时候Windows-1252。

  • 只有内的逻辑getCorrectCharsetToApply ()可以选择要应用的字符集,因此尝试通过其名称​​先于读取文件以调用此方法很可能导致,Java尝试读取文件名



  • 提前感谢!

    fileName.getBytes(),因为你有那里得到



    其次,阅读 FileReader


    此类别的建构函式假设默认字符
    编码和默认字节缓冲区大小是适当的。要自己指定
    这些值,请在
    FileInputStream上构造一个InputStreamReader。


    是不是去的路。如果我们接受docs中的建议,那么你应该更改你的代码有:

      String fileName = getFileNameToReadFromUserInput(); 
    FileInputStream is = new FileInputStream(fileName);
    InputStreamReader isr = new InputStreamReader(is,getCorrectCharsetToApply());
    BufferedReader buffReader = new BufferedReader(isr);

    ,而不是尝试创建FileReader。


    I am trying to read a file in as either UTF-8 or Windows-1252 depending on the output of this method:

    public Charset getCorrectCharsetToApply() {
        // Returns a Charset for either UTF-8 or Windows-1252.
    }
    

    So far, I have:

    String fileName = getFileNameToReadFromUserInput();
    InputStream is = new ByteArrayInputStream(fileName.getBytes());
    InputStreamReader isr = new InputStreamReader(is, getCorrectCharsetToApply());
    BufferedReader buffReader = new BufferedReader(isr);
    

    The problem I'm having is converting the BufferedReader instance to a FileReader.

    Furthermore:

    • The name of the file itself (fileName) cannot be trusted to be a particular Charset; sometime the file name will contain UTF-8 characters, and sometimes Windows-1252. Same goes for the file's content (however if file name and file content will always have matching charsets).
    • Only the logic inside getCorrectCharsetToApply() can select the charset to apply, so attempting to read a file by its name prior to calling this method could very well result with, Java trying to read the file name with the wrong encoding...which causes it to die!

    Thanks in advance!

    解决方案

    So, first, as a heads up, do realize that fileName.getBytes() as you have there gets the bytes of the filename, not the file itself.

    Second, reading inside the docs of FileReader:

    The constructors of this class assume that the default character encoding and the default byte-buffer size are appropriate. To specify these values yourself, construct an InputStreamReader on a FileInputStream.

    So, sounds like FileReader actually isn't the way to go. If we take the advice in the docs, then you should just change your code to have:

    String fileName = getFileNameToReadFromUserInput();
    FileInputStream is = new FileInputStream(fileName);
    InputStreamReader isr = new InputStreamReader(is, getCorrectCharsetToApply());
    BufferedReader buffReader = new BufferedReader(isr);
    

    and not try to make a FileReader at all.

    这篇关于如何读取具有特定字符编码的Java文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆