读取 UTF-8 - BOM 标记 [英] Reading UTF-8 - BOM marker
问题描述
我正在通过 FileReader 读取文件 - 该文件是 UTF-8 解码(使用 BOM),现在我的问题是:我读取文件并输出一个字符串,但遗憾的是 BOM 标记也被输出.为什么会出现这种情况?
I'm reading a file through a FileReader - the file is UTF-8 decoded (with BOM) now my problem is: I read the file and output a string, but sadly the BOM marker is outputted too. Why this occurs?
fr = new FileReader(file);
br = new BufferedReader(fr);
String tmp = null;
while ((tmp = br.readLine()) != null) {
String text;
text = new String(tmp.getBytes(), "UTF-8");
content += text + System.getProperty("line.separator");
}
第一行后输出
?<style>
推荐答案
在 Java 中,您必须手动使用 UTF8 BOM(如果存在).此行为记录在 Java 错误数据库中,此处 和 此处.目前没有修复,因为它会破坏现有的工具,如 JavaDoc 或 XML 解析器.Apache IOCommons 提供了一个 BOMInputStream
来处理这种情况.
In Java, you have to consume manually the UTF8 BOM if present. This behaviour is documented in the Java bug database, here and here. There will be no fix for now because it will break existing tools like JavaDoc or XML parsers. The Apache IO Commons provides a BOMInputStream
to handle this situation.
看看这个解决方案:处理UTF8文件带物料清单
这篇关于读取 UTF-8 - BOM 标记的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!