如何使用Java和Xerces解析确认1.1规范的XML? [英] How can I parse XML that confirms to the 1.1 spec using Java and Xerces?

查看:127
本文介绍了如何使用Java和Xerces解析确认1.1规范的XML?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试解析一个包含XML内容的字符串,该内容符合 XML 1.1规范。 XML包含XML 1.0规范中不允许但在XML 1.1规范中允许的字符引用(字符引用转换为U + 0001-U + 001F范围内的Unicode字符)。

I'm trying to parse a String which contains XML content which conforms to the XML 1.1 spec. The XML contains character references which are not allowed in the XML 1.0 spec but which are allowed in the XML 1.1 spec (character references which translate to Unicode characters in the range U+0001–U+001F).

根据 Xerces2网站,Xerces2解析器支持解析XML 1.1文档。但是,我无法弄清楚如何告诉它我们试图解析的XML包含符合1.1的XML。

According the Xerces2 website, the Xerces2 parser supports parsing XML 1.1 documents. However, I cannot figure out how to tell it the XML we are trying to parse contains 1.1-compliant XML.

我正在使用DocumentBuilder来解析XML(某些东西)像这样):

I'm using a DocumentBuilder to parse the XML (something like this):

public Element parseString(String xmlString) {
    try {
          DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
          DocumentBuilder documentBuilder = dbf.newDocumentBuilder();

          InputSource source = new InputSource(new StringReader(xmlString));

      // Throws org.xml.sax.SAXParseException becuase of the invalid character refs
          Document doc = documentBuilder.parse(source);

          return doc.getDocumentElement();

    } catch (ParserConfigurationException pce) {
          // Handle the error
    } catch (SAXException se) {
          // Handle the error
    } catch (IOException ioe) {
          // Handle the error
    }
}

我已尝试设置XML标头以指示XML符合1.1规范......

I've tried setting the XML header to indicate the XML conforms to the 1.1 spec...

xmlString = "<?xml version=\"1.1\" encoding=\"UTF-8\" ?>" + xmlString;

...但它仍然被解析为1.0 XML(仍然会生成无效的字符引用异常)。

...but it is still parsed as 1.0 XML (still generates the invalid character reference exceptions).

如何配置Xerces解析器以将XML解析为XML 1.1?是否有替代解析器为XML 1.1提供更好的支持?

How can I configure the Xerces parser to parse the XML as XML 1.1? Is there an alternative parser which provides better support for XML 1.1?

推荐答案

请参阅此处以获取xerces支持的所有功能的列表。可能低于2的功能是您必须打开的。

See here for a list of all the features supported by xerces. May be below 2 features is what you have to turn on.

http://xml.org/sax/features/unicode-normalization-检查

True:执行Unicode规范化检查(如第2.13节和XML 1.1建议书的附录B中所述)并报告规范化错误。

True: Perform Unicode normalization checking (as described in section 2.13 and Appendix B of the XML 1.1 Recommendation) and report normalization errors.

错误:不报告Unicode规范化错误。

False: Do not report Unicode normalization errors.

http://xml.org/sax/features/xml-1.1

True:解析器支持XML 1.0和XML 1.1。

False:解析器仅支持XML 1.0。

访问:只读
从:Xerces-J 2.7.0
注意:此功能的值取决于是否已知SAX解析器拥有的解析器配置支持XML 1.1。

True: The parser supports both XML 1.0 and XML 1.1.
False: The parser supports only XML 1.0.
Access: read-only Since: Xerces-J 2.7.0 Note: The value of this feature will depend on whether the parser configuration owned by the SAX parser is known to support XML 1.1.

这篇关于如何使用Java和Xerces解析确认1.1规范的XML?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆