用Java解析XML时出现问题 [英] Problems parsing XML in Java

查看：317 发布时间：2020/6/12 19:03:31 java xml xml-parsing document

本文介绍了用Java解析XML时出现问题的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在解析XML文档时遇到了一些麻烦.由于某些原因，有些文本节点是我所不希望的，因此我的测试变成红色. XML文件如下所示:

I got some trouble parsing an XML document. For some reason, there are text nodes where I would not expect them to be and therefore my test turns red. The XML file looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<RootNode>
  <PR1>PR1</PR1>
  <ROL>one</ROL>
  <ROL>two</ROL>
  <DG1>DG1</DG1>
  <ROL>three</ROL>
  <ZBK>ZBK</ZBK>
  <ROL>four</ROL>
</RootNode>

现在，我有这段代码片段可以重现该错误:

Now I have this snippet of code which can reproduce the error:

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(TestHL7Helper.class.getResourceAsStream("TestHL7HelperInput.xml"));
Node root = doc.getFirstChild();
Node pr1 = root.getFirstChild();

检查根变量会产生[RootNode: null]，这似乎是正确的，但后来却以某种方式全部出错. pr1变量原来是文本节点[#text:\n ]-但是解析器为什么认为换行符和空格是文本节点?那不应该被忽略吗?我尝试更改编码，但这也无济于事.有什么想法吗?

Inspecting the root variable yields [RootNode: null] which seems to be right, but then it somehow goes all wrong. The pr1 variable turns out to be a text node [#text:\n ] - but why does the parser think that the new line and the spaces are a text node? Shouldn't that be ignored? I tried changing the encoding but that did not help either. Any ideas on that?

如果我删除所有新行和空格并仅将XML文档放在一行中，则一切正常...

If I remove all new lines and space and have my XML document in just one line it all works fine...

推荐答案

XML支持混合内容，这意味着元素可以同时具有文本和元素子节点.这是为了支持以下用例:

XML supports mixed content meaning elements can have both text and element child nodes. This is to support use cases like the following:

<text>I've bolded the <b>important</b> part.</text>

input.xml

这意味着默认情况下，DOM解析器会将以下文档中的空白节点视为有效节点(以下是XML文档的简化版本):

This means that by default a DOM parser will treat the whitespace nodes in the following document as significant (below is a simplified version of your XML document):

<RootNode>
  <PR1>PR1</PR1>
</RootNode>

演示代码

如果您有XML模式，则可以在DocumentBuilderFactory上设置ignoringElementContentWhitespace属性，因为这样DOM解析器将知道空白是否有效以及何时有效.

If you have an XML schema you can set the ignoringElementContentWhitespace property on the DocumentBuilderFactory since then the DOM parser will know if and when the whitespace is significant.

import java.io.File;
import javax.xml.XMLConstants;
import javax.xml.parsers.*;
import javax.xml.validation.*;

import org.w3c.dom.Document;

public class Demo {

    public static void main(String[] args) throws Exception {
        SchemaFactory sf = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
        Schema s = sf.newSchema(new File("src/forum16231687/schema.xsd"));

        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        dbf.setSchema(s);
        dbf.setIgnoringElementContentWhitespace(true);

        DocumentBuilder db = dbf.newDocumentBuilder();
        Document d = db.parse(new File("src/forum16231687/input.xml"));
        System.out.println(d.getDocumentElement().getChildNodes().getLength());
    }

}

schema.xsd

如果您创建的schema.xsd如下所示，则演示代码将报告根元素具有1个子节点.

If you create schema.xsd that looks like the following then the demo code will report that the root element has 1 child node.

<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://www.w3.org/2001/XMLSchema">
    <element name="RootNode">
        <complexType>
            <sequence>
                <element name="PR1" type="string"/>
            </sequence>
        </complexType>
    </element>
</schema>

如果更改schema.xsd以使RootNode具有混合内容，则演示代码将报告RootNode具有3个子节点.

If you change schema.xsd so that the RootNode has mixed content the demo code will report that the RootNode has 3 child nodes.

<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://www.w3.org/2001/XMLSchema">
    <element name="RootNode">
        <complexType mixed="true">
            <sequence>
                <element name="PR1" type="string"/>
            </sequence>
        </complexType>
    </element>
</schema>

这篇关于用Java解析XML时出现问题的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

用Java解析XML时出现问题 [英] Problems parsing XML in Java

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

用Java解析XML时出现问题 [英] Problems parsing XML in Java

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭