将已知编码的文件转换为UTF-8 [英] Convert File with known encoding to UTF-8

查看:142
本文介绍了将已知编码的文件转换为UTF-8的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要将文本文件转换为String,最后,我应该将它作为输入参数(InputStream)输入到IFile.create(Eclipse)中。
寻找这个例子或怎么做,但还是无法弄清楚...需要你的帮助!

I need to convert text file to the String, which, finally, I should put as an input parameter (type InputStream) to IFile.create (Eclipse). Looking for the example or how to do that but still can not figure out...need your help!

只是为了测试,我试过转换原始文本文件以UTF-8编码此代码

just for testing, I did try to convert original text file to UTF-8 encoded with this code

FileInputStream fis = new FileInputStream(FilePath);
InputStreamReader isr = new InputStreamReader(fis);

Reader in = new BufferedReader(isr);
StringBuffer buffer = new StringBuffer();

int ch;
while ((ch = in.read()) > -1) {
    buffer.append((char)ch);
}
in.close();


FileOutputStream fos = new FileOutputStream(FilePath+".test.txt");
Writer out = new OutputStreamWriter(fos, "UTF8");
out.write(buffer.toString());
out.close();

但即使以为最终的* .test.txt文件有UTF-8编码,里面的字符是损坏。

but even thought the final *.test.txt file has UTF-8 encoding, the characters inside are corrupted.

推荐答案

您需要用 InputStreamReader code> Charset 参数。

You need to specify the encoding of the InputStreamReader using the Charset parameter.

                                    // ↓ whatever the input's encoding is
Charset inputCharset = Charset.forName("ISO-8859-1");
InputStreamReader isr = new InputStreamReader(fis, inputCharset));

这也可以:

InputStreamReader isr = new InputStreamReader(fis, "ISO-8859-1"));






另见:


See also:

  • InputStreamReader(InputStream in, Charset cs)
  • Charset.forName(String charsetName)
  • Java: How to determine the correct charset encoding of a stream
  • How to reliably guess the encoding between MacRoman, CP1252, Latin1, UTF-8, and ASCII
  • GuessEncoding - only works for UTF-8, UTF-16LE, UTF-16BE, and UTF-32 ☹
  • ICU Charset Detector
  • cpdetector, free java codepage detection
  • JCharDet (Java port of Mozilla charset detector) ironically, that page does not render the apostrophe in "Mozilla's" correctly

SO搜索我发现的地方所有这些链接: https://stackoverflow.com/search?q=java+detect+encoding

SO search where I found all these links: https://stackoverflow.com/search?q=java+detect+encoding

您可以获取默认的字符集 - 这是来自运行JVM的系统 - t运行时通过 Charset.defaultCharset()

You can get the default charset - which is comes from the system the JVM is running on - at runtime via Charset.defaultCharset().

这篇关于将已知编码的文件转换为UTF-8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆