如何从 XML 中删除不需要的标签 [英] How to remove unwanted tags from XML

查看:81
本文介绍了如何从 XML 中删除不需要的标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个巨大的 XML,我想从中删除不需要的标签.例如'

I have a huge XML and I want to remove unwanted tags from this. Ex.'

<orgs>
    <org name="Test1">
        <item>a</item>
        <item>b</item>
    </org>
    <org name="Test2">
        <item>c</item>
        <item>b</item>
        <item>e</item>
    </org>
</orgs>

我想从这个 xml 中删除所有 b.由于xml非常大,应该使用哪个解析器api来实现它.

I want to remove all the <item>b</item> from this xml. Which parser api should be use for this as xml is very large and How can achieve it.

推荐答案

一种方法是使用文档对象模型 (DOM),回到这一点,顾名思义,它需要将整个文档加载到内存和 Java 的 DOM API 非常需要内存.好处是,您可以利用 XPath 来查找违规节点

One approach would be to use a Document Object Model (DOM), the draw back to this, as the name suggests, it needs to load the entire document into memory and Java's DOM API is quite memory hungry. The benefit is, you can take advantage of XPath to find the offending nodes

仔细查看 Java API for XML Processing (JAXP) 以了解更多信息详细信息和其他 API

Take a closer look at Java API for XML Processing (JAXP) for more details and other APIs

DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.parse(new File("..."));

设置 2:找到违规节点

XPath xPath = XPathFactory.newInstance().newXPath();
XPathExpression xExpress = xPath.compile("/orgs/org/item[text()='b']");
NodeList nodeList = (NodeList) xExpress.evaluate(doc.getDocumentElement(), XPathConstants.NODESET);

设置 3:删除违规节点

好吧,这并不像应该的那么简单.删除一个节点会在文档中留下一个空白区域,清理起来会很好".下面的方法是我从我找到的一些互联网代码改编的一个简单的库方法,它将删除指定的 Node,但也会删除任何空白/文本节点

Set 3: Remove offending nodes

Okay, this is not as simple as it should be. Removing a node can leave a blank space in the document, which would be "nice" to clean up. The following method is a simple library method I adapted from some internet code(s) I found, which will remove the specified Node, but will also remove any white space/text nodes as well

public static void removeNode(Node node) {
    if (node != null) {
        while (node.hasChildNodes()) {
            removeNode(node.getFirstChild());
        }

        Node parent = node.getParentNode();
        if (parent != null) {
            parent.removeChild(node);
            NodeList childNodes = parent.getChildNodes();
            if (childNodes.getLength() > 0) {
                List<Node> lstTextNodes = new ArrayList<Node>(childNodes.getLength());
                for (int index = 0; index < childNodes.getLength(); index++) {
                    Node childNode = childNodes.item(index);
                    if (childNode.getNodeType() == Node.TEXT_NODE) {
                        lstTextNodes.add(childNode);
                    }
                }
                for (Node txtNodes : lstTextNodes) {
                    removeNode(txtNodes);
                }
            }
        }
    }
}

遍历违规节点...

for (int index = 0; index < nodeList.getLength(); index++) {
    Node node = nodeList.item(index);
    removeNode(node);
}

第 4 步:保存结果

Transformer tf = TransformerFactory.newInstance().newTransformer();
tf.setOutputProperty(OutputKeys.INDENT, "yes");
tf.setOutputProperty(OutputKeys.METHOD, "xml");
tf.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");

DOMSource domSource = new DOMSource(doc);
StreamResult sr = new StreamResult(System.out);
tf.transform(domSource, sr);

输出类似于...

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<orgs>
  <org name="Test1">
    <item>a</item>
  </org>
  <org name="Test2">
    <item>c</item>
    <item>e</item>
  </org>
</orgs>

这篇关于如何从 XML 中删除不需要的标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆