用Java解析XML时出现问题 [英] Problems parsing XML in Java

查看:317
本文介绍了用Java解析XML时出现问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在解析XML文档时遇到了一些麻烦.由于某些原因,有些文本节点是我所不希望的,因此我的测试变成红色. XML文件如下所示:

I got some trouble parsing an XML document. For some reason, there are text nodes where I would not expect them to be and therefore my test turns red. The XML file looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<RootNode>
  <PR1>PR1</PR1>
  <ROL>one</ROL>
  <ROL>two</ROL>
  <DG1>DG1</DG1>
  <ROL>three</ROL>
  <ZBK>ZBK</ZBK>
  <ROL>four</ROL>
</RootNode>

现在,我有这段代码片段可以重现该错误:

Now I have this snippet of code which can reproduce the error:

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(TestHL7Helper.class.getResourceAsStream("TestHL7HelperInput.xml"));
Node root = doc.getFirstChild();
Node pr1 = root.getFirstChild();

检查根变量会产生[RootNode: null],这似乎是正确的,但后来却以某种方式全部出错. pr1变量原来是文本节点[#text:\n ]-但是解析器为什么认为换行符和空格是文本节点?那不应该被忽略吗?我尝试更改编码,但这也无济于事.有什么想法吗?

Inspecting the root variable yields [RootNode: null] which seems to be right, but then it somehow goes all wrong. The pr1 variable turns out to be a text node [#text:\n ] - but why does the parser think that the new line and the spaces are a text node? Shouldn't that be ignored? I tried changing the encoding but that did not help either. Any ideas on that?

如果我删除所有新行和空格并仅将XML文档放在一行中,则一切正常...

If I remove all new lines and space and have my XML document in just one line it all works fine...

推荐答案

XML支持混合内容,这意味着元素可以同时具有文本和元素子节点.这是为了支持以下用例:

XML supports mixed content meaning elements can have both text and element child nodes. This is to support use cases like the following:

<text>I've bolded the <b>important</b> part.</text>

input.xml

这意味着默认情况下,DOM解析器会将以下文档中的空白节点视为有效节点(以下是XML文档的简化版本):

This means that by default a DOM parser will treat the whitespace nodes in the following document as significant (below is a simplified version of your XML document):

<RootNode>
  <PR1>PR1</PR1>
</RootNode>

演示代码

如果您有XML模式,则可以在DocumentBuilderFactory上设置ignoringElementContentWhitespace属性,因为这样DOM解析器将知道空白是否有效以及何时有效.

If you have an XML schema you can set the ignoringElementContentWhitespace property on the DocumentBuilderFactory since then the DOM parser will know if and when the whitespace is significant.

import java.io.File;
import javax.xml.XMLConstants;
import javax.xml.parsers.*;
import javax.xml.validation.*;

import org.w3c.dom.Document;

public class Demo {

    public static void main(String[] args) throws Exception {
        SchemaFactory sf = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
        Schema s = sf.newSchema(new File("src/forum16231687/schema.xsd"));

        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        dbf.setSchema(s);
        dbf.setIgnoringElementContentWhitespace(true);

        DocumentBuilder db = dbf.newDocumentBuilder();
        Document d = db.parse(new File("src/forum16231687/input.xml"));
        System.out.println(d.getDocumentElement().getChildNodes().getLength());
    }

}

schema.xsd

如果您创建的schema.xsd如下所示,则演示代码将报告根元素具有1个子节点.

If you create schema.xsd that looks like the following then the demo code will report that the root element has 1 child node.

<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://www.w3.org/2001/XMLSchema">
    <element name="RootNode">
        <complexType>
            <sequence>
                <element name="PR1" type="string"/>
            </sequence>
        </complexType>
    </element>
</schema>

如果更改schema.xsd以使RootNode具有混合内容,则演示代码将报告RootNode具有3个子节点.

If you change schema.xsd so that the RootNode has mixed content the demo code will report that the RootNode has 3 child nodes.

<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://www.w3.org/2001/XMLSchema">
    <element name="RootNode">
        <complexType mixed="true">
            <sequence>
                <element name="PR1" type="string"/>
            </sequence>
        </complexType>
    </element>
</schema>

这篇关于用Java解析XML时出现问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆