在Java中读取具有重音字符的文件 [英] reading file with accented characters in Java

查看:96
本文介绍了在Java中读取具有重音字符的文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到了两个特殊字符,似乎不被 ISO-8859-1 字符集所覆盖,即它们不会通过我的程序。 / p>

德语ß
和挪威语ø



我正在阅读文件如下:

  FileInputStream inputFile = new FileInputStream(corpus [i]); 
InputStreamReader ir = new InputStreamReader(inputFile,ISO-8859-1);

有没有办法读取这些字符,而不必将手动替换作为解决方法? / p>



这是屏幕上的外观。请注意,我没有其他重音的问题,例如è和很多...



解决方案

两个字符都存在于ISO-Latin-1(检查我的名字,看看为什么我看过这个)



如果字符未正确读取,最可能的原因是文件中的文本不会保存在该编码中,而是以其他方式保存。 / p>

根据您的操作系统和文件的来源,可能的编码可能是UTF-8或Windows代码页,如850或437。



最简单的方法是使用十六进制编辑器查看文件,并报告为这两个字符保存的确切值。


I came across two special characters which seem not to be covered by the ISO-8859-1 character set i.e. they don't make it through to my program.

The German ß and the Norwegian ø

i'm reading the files as follows:

FileInputStream inputFile = new FileInputStream(corpus[i]);
InputStreamReader ir = new InputStreamReader(inputFile, "ISO-8859-1") ;

Is there a way for me to read these characters without having to apply manual replacement as a workaround?

[EDIT]

this is how it looks on screen. Note that i have no problems with other accents e.g. è and the lot...

解决方案

Both characters are present in ISO-Latin-1 (check my name to see why I've looked into this).

If the characters are not read in correctly, the most likely cause is that the text in the file is not saved in that encoding, but in something else.

Depending on your operating system and the origin of the file, possible encodings could be UTF-8 or a Windows code page like 850 or 437.

The easiest way is to look at the file with a hex editor and report back what exact values are saved for these two characters.

这篇关于在Java中读取具有重音字符的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆