在序列化之前如何从DOM中分离空白的文本节点? [英] How to strip whitespace-only text nodes from a DOM before serialization?
问题描述
//将DOM序列化为字符串
Writer out = new StringWriter();
Transformer tf = TransformerFactory.newInstance()。newTransformer();
tf.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION,yes);
tf.setOutputProperty(OutputKeys.ENCODING,UTF-8);
tf.setOutputProperty(OutputKeys.INDENT,no);
tf.transform(new DOMSource(doc),新的StreamResult(out));
return out.toString();
然而,由于我删除了几个元素节点,所以我最终得到了很多额外的空格最后的序列化文档。
有没有一种简单的方法来从DOM中删除/折叠无关的空格(或者将它序列化成String)?
您可以使用XPath找到空文本节点,然后以编程方式删除它们:
XPathFactory xpathFactory = XPathFactory.newInstance();
// XPath找到空文本节点。
XPathExpression xpathExp = xpathFactory.newXPath()。compile(
// text()[normalize-space(。)='']);
NodeList emptyTextNodes =(NodeList)
xpathExp.evaluate(doc,XPathConstants.NODESET);
//从文档中删除每个空文本节点。 (int i = 0; i< emptyTextNodes.getLength(); i ++){
Node emptyTextNode = emptyTextNodes.item(i);
emptyTextNode.getParentNode()。removeChild(emptyTextNode);
}
如果您想要更容易地控制节点删除,这种方法可能很有用使用XSL模板实现。
I have some Java (5.0) code that constructs a DOM from various (cached) data sources, then removes certain element nodes that are not required, then serializes the result into an XML string using:
// Serialize DOM back into a string
Writer out = new StringWriter();
Transformer tf = TransformerFactory.newInstance().newTransformer();
tf.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
tf.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
tf.setOutputProperty(OutputKeys.INDENT, "no");
tf.transform(new DOMSource(doc), new StreamResult(out));
return out.toString();
However, since I'm removing several element nodes, I end up with a lot of extra whitespace in the final serialized document.
Is there a simple way to remove/collapse the extraneous whitespace from the DOM before (or while) it's serialized into a String?
You can find empty text nodes using XPath, then remove them programmatically like so:
XPathFactory xpathFactory = XPathFactory.newInstance();
// XPath to find empty text nodes.
XPathExpression xpathExp = xpathFactory.newXPath().compile(
"//text()[normalize-space(.) = '']");
NodeList emptyTextNodes = (NodeList)
xpathExp.evaluate(doc, XPathConstants.NODESET);
// Remove each empty text node from document.
for (int i = 0; i < emptyTextNodes.getLength(); i++) {
Node emptyTextNode = emptyTextNodes.item(i);
emptyTextNode.getParentNode().removeChild(emptyTextNode);
}
This approach might be useful if you want more control over node removal than is easily achieved with an XSL template.
这篇关于在序列化之前如何从DOM中分离空白的文本节点?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!