在序列化之前如何从DOM中分离空白的文本节点？ [英] How to strip whitespace-only text nodes from a DOM before serialization?

查看：127 发布时间：2017/6/24 20:59:03 java xml dom whitespace

本文介绍了在序列化之前如何从DOM中分离空白的文本节点？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一些Java（5.0）代码可以从各种（缓存）数据源构造DOM，然后删除不需要的某些元素节点，然后使用以下方式将结果序列化为XML字符串：

  //将DOM序列化为字符串
 Writer out = new StringWriter（）; 
 Transformer tf = TransformerFactory.newInstance（）。newTransformer（）; 
 tf.setOutputProperty（OutputKeys.OMIT_XML_DECLARATION，yes）; 
 tf.setOutputProperty（OutputKeys.ENCODING，UTF-8）; 
 tf.setOutputProperty（OutputKeys.INDENT，no）; 
 tf.transform（new DOMSource（doc），新的StreamResult（out））; 
 return out.toString（）;

然而，由于我删除了几个元素节点，所以我最终得到了很多额外的空格最后的序列化文档。

有没有一种简单的方法来从DOM中删除/折叠无关的空格（或者将它序列化成String）？

解决方案

您可以使用XPath找到空文本节点，然后以编程方式删除它们：

  XPathFactory xpathFactory = XPathFactory.newInstance（）; 
 // XPath找到空文本节点。 
 XPathExpression xpathExp = xpathFactory.newXPath（）。compile（
// text（）[normalize-space（。）='']）; 
 NodeList emptyTextNodes =（NodeList）
 xpathExp.evaluate（doc，XPathConstants.NODESET）; 
 
 //从文档中删除每个空文本节点。 （int i = 0; i< emptyTextNodes.getLength（）; i ++）{
 Node emptyTextNode = emptyTextNodes.item（i）; 
 
 emptyTextNode.getParentNode（）。removeChild（emptyTextNode）; 
}

如果您想要更容易地控制节点删除，这种方法可能很有用使用XSL模板实现。

I have some Java (5.0) code that constructs a DOM from various (cached) data sources, then removes certain element nodes that are not required, then serializes the result into an XML string using:

// Serialize DOM back into a string
Writer out = new StringWriter();
Transformer tf = TransformerFactory.newInstance().newTransformer();
tf.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
tf.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
tf.setOutputProperty(OutputKeys.INDENT, "no");
tf.transform(new DOMSource(doc), new StreamResult(out));
return out.toString();

However, since I'm removing several element nodes, I end up with a lot of extra whitespace in the final serialized document.

Is there a simple way to remove/collapse the extraneous whitespace from the DOM before (or while) it's serialized into a String?

解决方案

You can find empty text nodes using XPath, then remove them programmatically like so:

XPathFactory xpathFactory = XPathFactory.newInstance();
// XPath to find empty text nodes.
XPathExpression xpathExp = xpathFactory.newXPath().compile(
        "//text()[normalize-space(.) = '']");  
NodeList emptyTextNodes = (NodeList) 
        xpathExp.evaluate(doc, XPathConstants.NODESET);

// Remove each empty text node from document.
for (int i = 0; i < emptyTextNodes.getLength(); i++) {
    Node emptyTextNode = emptyTextNodes.item(i);
    emptyTextNode.getParentNode().removeChild(emptyTextNode);
}

This approach might be useful if you want more control over node removal than is easily achieved with an XSL template.

这篇关于在序列化之前如何从DOM中分离空白的文本节点？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在序列化之前如何从DOM中分离空白的文本节点？ [英] How to strip whitespace-only text nodes from a DOM before serialization?

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

在序列化之前如何从DOM中分离空白的文本节点？ [英] How to strip whitespace-only text nodes from a DOM before serialization?

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭