如何从 XML 中删除不需要的标签 [英] How to remove unwanted tags from XML
问题描述
我有一个巨大的 XML,我想从中删除不需要的标签.例如'
I have a huge XML and I want to remove unwanted tags from this. Ex.'
<orgs>
<org name="Test1">
<item>a</item>
<item>b</item>
</org>
<org name="Test2">
<item>c</item>
<item>b</item>
<item>e</item>
</org>
</orgs>
我想从这个 xml 中删除所有
.由于xml非常大,应该使用哪个解析器api来实现它.
I want to remove all the <item>b</item>
from this xml. Which parser api should be use for this as xml is very large and How can achieve it.
推荐答案
一种方法是使用文档对象模型 (DOM),回到这一点,顾名思义,它需要将整个文档加载到内存和 Java 的 DOM API 非常需要内存.好处是,您可以利用 XPath 来查找违规节点
One approach would be to use a Document Object Model (DOM), the draw back to this, as the name suggests, it needs to load the entire document into memory and Java's DOM API is quite memory hungry. The benefit is, you can take advantage of XPath to find the offending nodes
仔细查看 Java API for XML Processing (JAXP) 以了解更多信息详细信息和其他 API
Take a closer look at Java API for XML Processing (JAXP) for more details and other APIs
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.parse(new File("..."));
设置 2:找到违规节点
XPath xPath = XPathFactory.newInstance().newXPath();
XPathExpression xExpress = xPath.compile("/orgs/org/item[text()='b']");
NodeList nodeList = (NodeList) xExpress.evaluate(doc.getDocumentElement(), XPathConstants.NODESET);
设置 3:删除违规节点
好吧,这并不像应该的那么简单.删除一个节点会在文档中留下一个空白区域,清理起来会很好".下面的方法是我从我找到的一些互联网代码改编的一个简单的库方法,它将删除指定的 Node
,但也会删除任何空白/文本节点
Set 3: Remove offending nodes
Okay, this is not as simple as it should be. Removing a node can leave a blank space in the document, which would be "nice" to clean up. The following method is a simple library method I adapted from some internet code(s) I found, which will remove the specified Node
, but will also remove any white space/text nodes as well
public static void removeNode(Node node) {
if (node != null) {
while (node.hasChildNodes()) {
removeNode(node.getFirstChild());
}
Node parent = node.getParentNode();
if (parent != null) {
parent.removeChild(node);
NodeList childNodes = parent.getChildNodes();
if (childNodes.getLength() > 0) {
List<Node> lstTextNodes = new ArrayList<Node>(childNodes.getLength());
for (int index = 0; index < childNodes.getLength(); index++) {
Node childNode = childNodes.item(index);
if (childNode.getNodeType() == Node.TEXT_NODE) {
lstTextNodes.add(childNode);
}
}
for (Node txtNodes : lstTextNodes) {
removeNode(txtNodes);
}
}
}
}
}
遍历违规节点...
for (int index = 0; index < nodeList.getLength(); index++) {
Node node = nodeList.item(index);
removeNode(node);
}
第 4 步:保存结果
Transformer tf = TransformerFactory.newInstance().newTransformer();
tf.setOutputProperty(OutputKeys.INDENT, "yes");
tf.setOutputProperty(OutputKeys.METHOD, "xml");
tf.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
DOMSource domSource = new DOMSource(doc);
StreamResult sr = new StreamResult(System.out);
tf.transform(domSource, sr);
输出类似于...
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<orgs>
<org name="Test1">
<item>a</item>
</org>
<org name="Test2">
<item>c</item>
<item>e</item>
</org>
</orgs>
这篇关于如何从 XML 中删除不需要的标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!