奇怪的XML缩进 [英] Strange XML indentation

查看:155
本文介绍了奇怪的XML缩进的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在写一个XML文件,标签出来略有错误:

I'm writing an XML file, and the tabbing is coming out slightly wrong :

<BusinessEvents>

<MailEvent>
          <to>Wellington</to>
          <weight>10.0</weight>
          <priority>air priority</priority>
          <volume>10.0</volume>
          <from>Christchurch</from>
          <day>Mon May 20 14:30:08 NZST 2013</day>
          <PPW>8.0</PPW>
          <PPV>2.5</PPV>
     </MailEvent>
<DiscontinueEvent>
          <to>Wellington</to>
          <priority>air priority</priority>
          <company>Kiwi Co</company>
          <from>Sydney</from>
     </DiscontinueEvent>
<RoutePriceUpdateEvent>
          <weightcost>3.0</weightcost>
          <to>Wellington</to>
          <duration>15.0</duration>
          <maxweight>40.0</maxweight>
          <maxvolume>20.0</maxvolume>
          <priority>air priority</priority>
          <company>Kiwi Co</company>
          <day>Mon May 20 14:30:08 NZST 2013</day>
          <frequency>3.0</frequency>
          <from>Wellington</from>
          <volumecost>2.0</volumecost>
     </RoutePriceUpdateEvent>
<CustomerPriceUpdateEvent>
          <weightcost>3.0</weightcost>
          <to>Wellington</to>
          <priority>air priority</priority>
          <from>Sydney</from>
          <volumecost>2.0</volumecost>
     </CustomerPriceUpdateEvent>
</BusinessEvents>

如您所见,第一个子节点根本不缩进,但是该节点的子节点是缩进的两次?
,然后关闭标签只能缩进一次?

As you can see, the first child node is not indented at all, but that nodes child is indented twice? and then the close tag is only indented once?

我怀疑可能需要通过 doc.appendChild(root),但是当我这样做时,我收到错误

I suspect it might have to do with adding the root not to the document through doc.appendChild(root), but when I do that then I get an error

尝试插入一个不允许的节点

这是我的解析器:

DocumentBuilderFactory icFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder icBuilder;
        try {
            icBuilder = icFactory.newDocumentBuilder();
            String businessEventsFile = System.getProperty("user.dir") + "/testdata/businessevents/businessevents.xml";
            Document doc = icBuilder.parse (businessEventsFile);

            Element root = doc.getDocumentElement();

            Element element;

            if(event instanceof CustomerPriceUpdateEvent){
                element = doc.createElement("CustomerPriceUpdateEvent");
            }
            else if(event instanceof DiscontinueEvent){
                element = doc.createElement("DiscontinueEvent");
            }
            else if(event instanceof MailEvent){
                element = doc.createElement("MailEvent");
            }
            else if(event instanceof RoutePriceUpdateEvent){
                element = doc.createElement("RoutePriceUpdateEvent");
            }
            else{
                throw new Exception("business event isnt valid");
            }

            for(Map.Entry<String, String> field : event.getFields().entrySet()){
                Element newElement = doc.createElement(field.getKey());
                newElement.appendChild(doc.createTextNode(field.getValue()));
                element.appendChild(newElement);
            }

            root.appendChild(element);


            // output DOM XML to console
            Transformer transformer = TransformerFactory.newInstance().newTransformer();
//            transformer.setOutputProperty(OutputKeys.METHOD, "xml");
            transformer.setOutputProperty(OutputKeys.INDENT, "yes");
            transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "5");
            DOMSource source = new DOMSource(doc);
            StreamResult console = new StreamResult(businessEventsFile);
            transformer.transform(source, console);

任何见解将不胜感激。

推荐答案

之前我有同样的问题。
我发现问题是解析的文档将文本中包含空白作为文本节点。

I had the same problem a while ago. I found out that the problem was that the parsed document included white space as text nodes all over the document.

例如,在解析文档之后可能在< BusinessEvents> 节点之前的< MailEvent> 节点之前有一个空白文本节点。
变形金刚保留空白的文本节点(我认为是正确的行为)。

For example, after parsing the document, you probably have a blank text node right before the <MailEvent> node under the <BusinessEvents> node. The Transformer keeps blank text nodes (which I assume is correct behaviour).

所以,如果xml文本中的标签之间没有空格变压器正确缩进了标签。
您可以使用代码手动从输入中删除所有空格,包括换行符,然后执行格式。输出结果可能会更多的是您期望的。

So, if there is no space at all between the tags in the xml text, the Transformer correctly indents the tags. You could try this with your code by manually deleting all whitespace, including line breaks, from your input, and then do a format. The output would then probably be more what you would expect.

解决这个问题的一种方法是在文档解析后从文档中删除多余的空格。
只需删除所有空白的文本节点将使格式化看起来更好,但问题是如果一些空白文本节点实际上需要。

One way to solve this is to remove redundant whitespace from the document after it has been parsed. Simply removing all blank text nodes will make the formatting look better, but the problem is if some of the blank text nodes are actually needed.

那么我在格式化之前清理文档是为了删除仅包含空格的所有文本节点,之外,对于那些文本节点是唯一的小孩(没有兄弟姐妹)的情况。

So what I did to clean up the document before formatting was to remove all text nodes containing only whitespace, except for those cases where the text node were the only child (no siblings).

方法 cleanEmptyTextNodes(Node parentNode)以下递归删除子树中的所有空白文本节点。

The method cleanEmptyTextNodes(Node parentNode) below recursively removes all blank text nodes from a subtree.

import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.StringWriter;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;

import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.xml.sax.SAXException;

public class FormatXml {

    public static void main(String[] args) throws ParserConfigurationException,
            FileNotFoundException, SAXException, IOException,
            TransformerException {
        DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory
                .newInstance();
        DocumentBuilder documentBuilder = docBuilderFactory
                .newDocumentBuilder();
        Document node = documentBuilder.parse(new FileInputStream("data.xml"));
        System.out.println(format(node, 4));
    }

    public static String format(Node node, int indent)
            throws TransformerException {
        cleanEmptyTextNodes(node);
        StreamResult result = new StreamResult(new StringWriter());
        getTransformer(indent).transform(new DOMSource(node), result);
        return result.getWriter().toString();
    }

    private static Transformer getTransformer(int indent) {
        Transformer transformer;
        try {
            transformer = TransformerFactory.newInstance().newTransformer();
        } catch (Exception e) {
            throw new RuntimeException("Failed to create the Transformer", e);
        }
        transformer.setOutputProperty(OutputKeys.INDENT, "yes");
        transformer.setOutputProperty(
                "{http://xml.apache.org/xslt}indent-amount",
                Integer.toString(indent));
        return transformer;
    }

    /**
     * Removes text nodes that only contains whitespace. The conditions for
     * removing text nodes, besides only containing whitespace, are: If the
     * parent node has at least one child of any of the following types, all
     * whitespace-only text-node children will be removed: - ELEMENT child -
     * CDATA child - COMMENT child
     * 
     * The purpose of this is to make the format() method (that use a
     * Transformer for formatting) more consistent regarding indenting and line
     * breaks.
     */
    private static void cleanEmptyTextNodes(Node parentNode) {
        boolean removeEmptyTextNodes = false;
        Node childNode = parentNode.getFirstChild();
        while (childNode != null) {
            removeEmptyTextNodes |= checkNodeTypes(childNode);
            childNode = childNode.getNextSibling();
        }

        if (removeEmptyTextNodes) {
            removeEmptyTextNodes(parentNode);
        }
    }

    private static void removeEmptyTextNodes(Node parentNode) {
        Node childNode = parentNode.getFirstChild();
        while (childNode != null) {
            // grab the "nextSibling" before the child node is removed
            Node nextChild = childNode.getNextSibling();

            short nodeType = childNode.getNodeType();
            if (nodeType == Node.TEXT_NODE) {
                boolean containsOnlyWhitespace = childNode.getNodeValue()
                        .trim().isEmpty();
                if (containsOnlyWhitespace) {
                    parentNode.removeChild(childNode);
                }
            }
            childNode = nextChild;
        }
    }

    private static boolean checkNodeTypes(Node childNode) {
        short nodeType = childNode.getNodeType();

        if (nodeType == Node.ELEMENT_NODE) {
            cleanEmptyTextNodes(childNode); // recurse into subtree
        }

        if (nodeType == Node.ELEMENT_NODE
                || nodeType == Node.CDATA_SECTION_NODE
                || nodeType == Node.COMMENT_NODE) {
            return true;
        } else {
            return false;
        }
    }

}

格式输出与您的输入:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<BusinessEvents>
    <MailEvent>
        <to>Wellington</to>
        <weight>10.0</weight>
        <priority>air priority</priority>
        <volume>10.0</volume>
        <from>Christchurch</from>
        <day>Mon May 20 14:30:08 NZST 2013</day>
        <PPW>8.0</PPW>
        <PPV>2.5</PPV>
    </MailEvent>
    <DiscontinueEvent>
        <to>Wellington</to>
        <priority>air priority</priority>
        <company>Kiwi Co</company>
        <from>Sydney</from>
    </DiscontinueEvent>
    <RoutePriceUpdateEvent>
        <weightcost>3.0</weightcost>
        <to>Wellington</to>
        <duration>15.0</duration>
        <maxweight>40.0</maxweight>
        <maxvolume>20.0</maxvolume>
        <priority>air priority</priority>
        <company>Kiwi Co</company>
        <day>Mon May 20 14:30:08 NZST 2013</day>
        <frequency>3.0</frequency>
        <from>Wellington</from>
        <volumecost>2.0</volumecost>
    </RoutePriceUpdateEvent>
    <CustomerPriceUpdateEvent>
        <weightcost>3.0</weightcost>
        <to>Wellington</to>
        <priority>air priority</priority>
        <from>Sydney</from>
        <volumecost>2.0</volumecost>
    </CustomerPriceUpdateEvent>
</BusinessEvents>

这篇关于奇怪的XML缩进的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆