读/写文本文件里的特殊字符 [英] Read/write .txt file with special characters

查看:166
本文介绍了读/写文本文件里的特殊字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我打开记事本(Windows)和写

I open Notepad (Windows) and write

Some lines with special characters
Special: Žđšćč

和转到另存为... someFile.txt是编码设置为 UTF-8

and go to Save As... "someFile.txt" with Encoding set to UTF-8.

在Java中我有

FileInputStream fis = new FileInputStream(new File("someFile.txt"));
InputStreamReader isr = new InputStreamReader(fis, "UTF-8");
BufferedReader in = new BufferedReader(isr);

String line;
while((line = in.readLine()) != null) {                         
    printLine(line);
}
in.close();

不过,我得到问号和类似的特字。为什么呢?

But I get question marks and similar "special" characters. Why?

编辑:我有这样的投入(以.txt文件一行)

I have this input (one line in .txt file)

665,Žđšćč

这code

and this code

FileInputStream fis = new FileInputStream(new File(fileName));
InputStreamReader isr = new InputStreamReader(fis, "UTF-8");
BufferedReader in = new BufferedReader(isr);

String line;
while((line = in.readLine()) != null) {
    Toast.makeText(mContext, line, Toast.LENGTH_LONG).show();

    Pattern p = Pattern.compile(",");
    String[] article = p.split(line);

    Toast.makeText(mContext, article[0], Toast.LENGTH_LONG).show();
    Toast.makeText(mContext, Integer.parseInt(article[0]), Toast.LENGTH_LONG).show();
}
in.close();

吐司输出(对于那些谁不熟悉的Andr​​oid,吐司只是一个方法来显示在屏幕上弹出,在这特定的文本)的罚款。控制台显示奇怪的字符(编码可能是因为在控制台窗口)。但它无法在解析整数因为控制台说,这(警告:敬酒输出就好了) - 的问题

And Toast output (for ones who aren't familiar with Android, Toast is just a method to show a pop-up on screen with particular text in it) is fine. Console shows "weird characters" (probably because of encoding in console window). But it fails at parsing an integer because console says this (warning: toast output is just fine) - Problem?

这似乎是该字符串包含一些怪异的字符,这吐司不能显示/渲染,但是当我尝试分析它,它崩溃。建议?

It seems like the String is containing some "weird" characters which Toast can't show/render but when I try to parse it, it crashes. Suggestions?

如果我把ANSI在记事本中它的工作原理(整数解析),也没有奇怪的字符如上面的图片,但当然,我的特殊字符无法正常工作。

If I put ANSI in NotePad it works (integer parsing) and there are no weird chars as in the picture above, but of course my special characters aren't working.

推荐答案

这是它不支持这些字符输出控制台。既然你使用Eclipse,你需要确保它的配置为使用UTF-8这一点。你可以做到这一点的窗口> preferences>常规>工作空间>文本文件编码>设置为UTF-8 的。

It's the output console which doesn't support those characters. Since you're using Eclipse, you need to ensure that it's configured to use UTF-8 for this. You can do this by Window > Preferences > General > Workspace > Text File Encoding > set to UTF-8.

更新作为每个更新的问题和意见,显然 UTF-8 BOM 是罪魁祸首。记事本默认添加UTF-8 BOM保存。它看起来像在你的HTC的JRE不下咽了。您可能要考虑使用统一codeReader 例如在的在code这个答案代替 InputStreamReader的。它会自动检测并跳过BOM。

Update as per the updated question and the comments, apparently the UTF-8 BOM is the culprit. Notepad by default adds the UTF-8 BOM on save. It look like that the JRE on your HTC doesn't swallow that. You may want to consider to use the UnicodeReader example as outlined in this answer instead of InputStreamReader in your code. It autodetects and skips the BOM.

FileInputStream fis = new FileInputStream(new File(fileName));
UnicodeReader ur = new UnicodeReader(fis, "UTF-8");
BufferedReader in = new BufferedReader(ur);


无关的实际问题,这是一个很好的做法,关闭资源最后块,这样你保证,他们将在情况异常关闭。


Unrelated to the actual problem, it's a good practice to close resources in finally block so that you ensure that they will be closed in case of exceptions.

BufferedReader reader = null;
try {
    reader = new BufferedReader(new UnicodeReader(new FileInputStream(fileName), "UTF-8"));
    // ...
} finally {
    if (reader != null) try { reader.close(); } catch (IOException logOrIgnore) {}
}

此外无关,我建议把模式P = Pattern.compile(); 外循环,甚至使它成为一个静态常量,因为它是相对昂贵的编译它,它没有必要做到这一点,每次在循环中。

Also unrelated, I'd suggest to put Pattern p = Pattern.compile(","); outside the loop, or even make it a static constant, because it's relatively expensive to compile it and it's unnecessary to do this everytime inside a loop.

这篇关于读/写文本文件里的特殊字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆