错误解析XML文件时,DOM [英] Error when parsing an XML file to DOM

查看:140
本文介绍了错误解析XML文件时,DOM的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图解析使用的DocumentBuilderFactory一个XML文件如下:

I'm trying to parse an XML file using DocumentBuilderFactory as follows:

DocumentBuilderFactory ndsParserFactory = DocumentBuilderFactory.newInstance( );
ndsParserFactory.setNamespaceAware( true );
DocumentBuilder ndsParser = ndsParserFactory.newDocumentBuilder( );
Document ndsDocument = ndsParser.parse( ndsFileInputStream );

其中ndsFileInputStream是一个InputStream包裹包含XML文件。

where ndsFileInputStream is an InputStream wrapping the file containing the XML.

我得到在文件中包含的Uni code字符,如Δ例外。当我带出含有违规的字符所在的行,解析工作得很好。

I get an exception when the file contains a Unicode character such as Δ. When I strip out the line containing the offending character, the parsing works just fine.

该文件包含的特征<?XML版本=1.0编码=UTF-8>

我想知道如果我忽略了配置的DocumentBuilderFactory(或的DocumentBuilder)实例正确,以处理Δ字符。

I'm wondering if I'm neglecting to configure the DocumentBuilderFactory (or DocumentBuilder) instance properly in order to handle the Δ character.

修改(从评论):

披露:这是Android的,而我,包括XML文件(使用NDS文件扩展名)在我的Andr​​oid应用程序的资产。我通过AssetManager,这对打开资产文件转换成一个InputStream,然后我传递给我的DocumentBuilder的解析方法的方便,花花公子方式访问它们。 - ð焊缝15小时前

Full disclosure: This is Android, and I'm including XML files (with an NDS file extension) as assets in my Android app. I access them via the AssetManager, which has a handy-dandy method for opening an asset file into an InputStream, which I then pass to the parse method of my DocumentBuilder. – d weld 16 hours ago

我注意到,资产文件夹在默认情况下其内容使用CP1252的编码。所以我改变了这一切为UTF8。没有运气。然后,我从(每条链路)的NDS文件之一删除BOM和再次尝试。没有运气。我在想,apk文件(这是COM pressed像一个ZIP文件)以某种方式重整非ASCII XML。我想我将不得不诉诸其他手段获取NDS文件到Android设备......

I noticed that the assets folder uses an encoding of CP1252 by default for its contents. So I changed that to UTF8. No luck. Then I removed the BOM from one of the NDS files (per link) and tried again. No luck. I'm thinking that the APK file (which is compressed like a ZIP file) is somehow mangling the non-ASCII XML. I think I'll have to resort to getting the NDS files onto the Android device by other means...

推荐答案

您确定该文件确实是写为UTF-8?很明显,你可以在某些编辑器中打开它,它正确显示文本,但它可能只是制定好猜测的编码。

Are you sure the file is really written as UTF-8? Obviously you can open it in some editor and it shows the text correctly, but it could just be making a good guess as the encoding.

其他的事情要记住的是所有的人物都是统一code为UTF-8 - 当它击中一个字节序列,是不是在声明编码有效解析器只是呛。 UTF-8是一个非常宽容的编码作为7位ASCII字符集的任何字符用为en codeD,就好像它是纯ASCII,以及大量的XML是由什么,但纯ASCII字符。这就抓住了人们的东西时,非ASCII通过显现一个系统自带的文本编码路径,突然缺陷。

The other thing to remember is all the characters are Unicode in UTF-8 - the parser is just choking when it hits a byte sequence that isn't valid in the declared encoding. UTF-8 is a very forgiving encoding to use as any character in the 7-bit ASCII set is encoded as if it is plain ASCII, and a lot of XML is made up of nothing but plain ASCII characters. This then catches people out when something non-ASCII comes up and suddenly defects in the text encoding paths through a system become apparent.

您可以尝试编辑XML声明,看看它是否解析下一个字符编码确定; 8859-7 包含Δ符号 - 会不会是连接$ C $光盘呢?

You could try editing the XML declaration and see if it parses ok under another character encoding; 8859-7 contains the Δ symbol - could it be encoded in that?

此外,什么是例外?

这篇关于错误解析XML文件时,DOM的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆