如何在序列化之前从 DOM 中去除纯空白文本节点? [英] How to strip whitespace-only text nodes from a DOM before serialization?
问题描述
我有一些 Java (5.0) 代码从各种(缓存的)数据源构造一个 DOM,然后删除某些不需要的元素节点,然后使用以下方法将结果序列化为 XML 字符串:
I have some Java (5.0) code that constructs a DOM from various (cached) data sources, then removes certain element nodes that are not required, then serializes the result into an XML string using:
// Serialize DOM back into a string
Writer out = new StringWriter();
Transformer tf = TransformerFactory.newInstance().newTransformer();
tf.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
tf.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
tf.setOutputProperty(OutputKeys.INDENT, "no");
tf.transform(new DOMSource(doc), new StreamResult(out));
return out.toString();
但是,由于我要删除多个元素节点,因此最终序列化文档中会出现很多额外的空白.
However, since I'm removing several element nodes, I end up with a lot of extra whitespace in the final serialized document.
是否有一种简单的方法可以在将 DOM 序列化为字符串之前(或同时)从 DOM 中删除/折叠它?
Is there a simple way to remove/collapse the extraneous whitespace from the DOM before (or while) it's serialized into a String?
推荐答案
您可以使用 XPath 找到空文本节点,然后像这样以编程方式删除它们:
You can find empty text nodes using XPath, then remove them programmatically like so:
XPathFactory xpathFactory = XPathFactory.newInstance();
// XPath to find empty text nodes.
XPathExpression xpathExp = xpathFactory.newXPath().compile(
"//text()[normalize-space(.) = '']");
NodeList emptyTextNodes = (NodeList)
xpathExp.evaluate(doc, XPathConstants.NODESET);
// Remove each empty text node from document.
for (int i = 0; i < emptyTextNodes.getLength(); i++) {
Node emptyTextNode = emptyTextNodes.item(i);
emptyTextNode.getParentNode().removeChild(emptyTextNode);
}
如果您希望对节点删除进行更多控制,而使用 XSL 模板无法轻松实现,则此方法可能很有用.
This approach might be useful if you want more control over node removal than is easily achieved with an XSL template.
这篇关于如何在序列化之前从 DOM 中去除纯空白文本节点?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!