如何让SAX解析器从xml声明中确定编码? [英] Howto let the SAX parser determine the encoding from the xml declaration?

查看:205
本文介绍了如何让SAX解析器从xml声明中确定编码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从不同的源解析xml文件(我几乎没有控制)。它们大部分以UTF-8编码,并且不会使用以下代码段导致任何问题:

I'm trying to parse xml files from different sources (over which I have little control). Most of the them are encoded in UTF-8 and don't cause any problems using the following snippet:

SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser parser = factory.newSAXParser();
FeedHandler handler = new FeedHandler();
InputSource is = new InputSource(getInputStream());
parser.parse(is, handler);

由于SAX默认为UTF-8,这很好。但是有些文件声明:

Since SAX defaults to UTF-8 this is fine. However some of the documents declare:

<?xml version="1.0" encoding="ISO-8859-1"?>

尽管ISO-8859-1被声明为SAX仍默认为UTF-8。
只有当我添加:

Even though ISO-8859-1 is declared SAX still defaults to UTF-8. Only if I add:

is.setEncoding("ISO-8859-1");

SAX将使用正确的编码。

Will SAX use the correct encoding.

如何让SAX自动检测xml声明中的正确编码,而不专门设定?我需要这个,因为我不知道文件的编码将是什么。

How can I let SAX automatically detect the correct encoding from the xml declaration without me specifically setting it? I need this because I don't know before hand what the encoding of the file will be.

提前感谢
Allan

Thanks in advance, Allan

推荐答案

使用 InputStream 作为参数 InputSource ,当您希望Sax自动检测编码时。

Use InputStream as argument to InputSource when you want Sax to autodetect the encoding.

如果要设置特定编码,请使用指定编码的 Reader setEncoding 方法。

If you want to set a specific encoding, use Reader with a specified encoding or setEncoding method.

为什么?因为自动检测编码算法需要原始数据,而不会转换为字符。

Why? Because autodetection encoding algorithms require raw data, not converted to characters.

主题中的问题是:如何让SAX解析器从xml声明中确定编码?我发现Allan对这个问题的回答是误导的,我提供了替代一个,基于JörnHorstmann的评论和我以后的经验。

The question in the subject is: How to let the SAX parser determine the encoding from the xml declaration? I found Allan's answer to the question misleading and I provided the alternative one, based on Jörn Horstmann's comment and my later experience.

这篇关于如何让SAX解析器从xml声明中确定编码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆