如何让 SAX 解析器从 xml 声明中确定编码? [英] Howto let the SAX parser determine the encoding from the xml declaration?

查看:38
本文介绍了如何让 SAX 解析器从 xml 声明中确定编码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试解析来自不同来源(我几乎无法控制)的 xml 文件.它们中的大多数都以 UTF-8 编码,使用以下代码段不会造成任何问题:

I'm trying to parse xml files from different sources (over which I have little control). Most of the them are encoded in UTF-8 and don't cause any problems using the following snippet:

SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser parser = factory.newSAXParser();
FeedHandler handler = new FeedHandler();
InputSource is = new InputSource(getInputStream());
parser.parse(is, handler);

因为 SAX 默认为 UTF-8,所以没问题.然而,一些文件声明:

Since SAX defaults to UTF-8 this is fine. However some of the documents declare:

<?xml version="1.0" encoding="ISO-8859-1"?>

即使声明了 ISO-8859-1,SAX 仍然默认为 UTF-8.仅当我添加:

Even though ISO-8859-1 is declared SAX still defaults to UTF-8. Only if I add:

is.setEncoding("ISO-8859-1");

SAX 是否会使用正确的编码.

Will SAX use the correct encoding.

如何让 SAX 自动从 xml 声明中检测正确的编码,而无需我专门设置它?我需要这个,因为我事先不知道文件的编码是什么.

How can I let SAX automatically detect the correct encoding from the xml declaration without me specifically setting it? I need this because I don't know before hand what the encoding of the file will be.

提前致谢,艾伦

推荐答案

使用 InputStream 作为 InputSource 当您希望 Sax 自动检测编码时.

Use InputStream as argument to InputSource when you want Sax to autodetect the encoding.

如果要设置特定编码,请使用带有指定编码的 ReadersetEncoding 方法.

If you want to set a specific encoding, use Reader with a specified encoding or setEncoding method.

为什么?因为自动检测编码算法需要原始数据,而不是转换为字符.

Why? Because autodetection encoding algorithms require raw data, not converted to characters.

主题中的问题是:如何让 SAX 解析器从 xml 声明中确定编码? 我发现 Allan 对该问题的回答具有误导性,我提供了替代方法,基于 Jörn Horstmann 的评论和我后来的经历.

The question in the subject is: How to let the SAX parser determine the encoding from the xml declaration? I found Allan's answer to the question misleading and I provided the alternative one, based on Jörn Horstmann's comment and my later experience.

这篇关于如何让 SAX 解析器从 xml 声明中确定编码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆