如何在读取文件时忽略空格以生成XML DOM [英] How to ignore whitespace while reading a file to produce an XML DOM

查看：178 发布时间：2018/12/4 13:09:14 java xml whitespace

本文介绍了如何在读取文件时忽略空格以生成XML DOM的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试读取一个文件来生成一个DOM文档，但该文件有空格和换行符，我试图忽略它们，但我不能：

I'm trying to read a file to produce a DOM Document, but the file has whitespace and newlines and I'm trying to ignore them, but I couldn't:

DocumentBuilderFactory docfactory=DocumentBuilderFactory.newInstance();
docfactory.setIgnoringElementContentWhitespace(true);

我在Javadoc中看到setIgnoringElementContentWhitespace方法仅在启用验证标志时才运行，但我没有文档的DTD或XML模式。

I see in Javadoc that setIgnoringElementContentWhitespace method operates only when the validating flag is enabled, but I haven't the DTD or XML Schema for the document.

我该怎么办？

更新

我不喜欢介绍mySelf的想法< ！ELEMENT ...声明，我已经尝试了解决方案Tomalak指出的>论坛，但它不起作用，我在linux环境中使用了java 1.6。我想如果不再提议，我会做一些方法来忽略空格文本节点

I don't like the idea of introduce mySelf < !ELEMENT... declarations and i have tried the solution proposed in the forum pointed by Tomalak, but it doesn't work, i have used java 1.6 in an linux environment. I think if no more is proposed i will make a few methods to ignore whitespace text nodes

推荐答案

'IgnoringElementContentWhitespace'不是关于删除所有纯空白文本节点，只有空格在模式中描述为具有ELEMENT内容的空格节点 - 也就是说，它们只包含其他元素而不包含文本。

‘IgnoringElementContentWhitespace’ is not about removing all pure-whitespace text nodes, only whitespace nodes whose parents are described in the schema as having ELEMENT content — that is to say, they only contain other elements and never text.

如果您没有使用架构（DTD或XSD），则元素内容默认为MIXED，因此此参数将永远不会产生任何影响。（除非解析器提供非标准DOM扩展来将所有未知元素视为包含ELEMENT内容，据我所知，Java可用的内容不会。）

If you don't have a schema (DTD or XSD) in use, element content defaults to MIXED, so this parameter will never have any effect. (Unless the parser provides a non-standard DOM extension to treat all unknown elements as containing ELEMENT content, which as far as I know the ones available for Java do not.)

您可以在进入解析器的途中破解文档以包含架构信息，例如通过向<中添加内部子集。！DOCTYPE ... [...]>声明包含< ！ELEMENT ...>声明，然后使用IgnoringElementContentWhitespace参数。

You could hack the document on the way into the parser to include the schema information, for example by adding an internal subset to the < !DOCTYPE ... [...] > declaration containing < !ELEMENT ... > declarations, then use the IgnoringElementContentWhitespace parameter.

或者，可能更容易，您可以在后处理中删除空白节点，或者当他们使用LSParserFilter时。

Or, possibly easier, you could just strip out the whitespace nodes, either in a post-process, or as they come in using an LSParserFilter.

这篇关于如何在读取文件时忽略空格以生成XML DOM的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在读取文件时忽略空格以生成XML DOM [英] How to ignore whitespace while reading a file to produce an XML DOM

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

如何在读取文件时忽略空格以生成XML DOM [英] How to ignore whitespace while reading a file to produce an XML DOM

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭